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(57) Abstract 



A sprite generation method used in video coding generates a sprite from the video objects in the frames of a video sequence. The 
method estimates the motion (1200) between a video object in a current frame and a sprite constructed from video objects for previous 
frames. Specifically, the method computes motion coefficients of a 2D transform that minimizes the intensity errors between pixels in 
the video object and corresponding pixels inside the sprite. The method uses the motion coefficients from the previous frame (1206) as 
a starting point to minimizing the intensity errors. After estimating the motion parameters for an object in the current frame, the method 
transforms the video object to the coordinate system of the sprite. The method blends (1204) the warped pixels (1202) of the video object 
with the pixels at corresponding positions in the sprite using rounding average such that each video object in the video sequence provides 
substantially the same contribution to the sprite. 
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METHOD FOR GENERATING SPRITES FOR OBJECT-BASED CODING SYSTEMS 
USING MASKS AND ROUNDING AVERAGE 

FIELD OF THE INVENTION 

5 The invention relates to object-based video coding, and more specifically, relates to 

generating an image called a sprite that includes pixel data used to reconstruct a video object in each 
frame of a video sequence. 

BACKGROUND OF THE INVENTION 

10 Full-motion video displays based upon analog video signals have long been available in the 

form of television. With recent increases in computer processing capabilities and affordability, full- 
motion video displays based upon digital video signals are becoming more widely available. Digital 
video systems can provide significant improvements over conventional analog video systems in 
creating, modifying, transmitting, storing, and playing full-motion video sequences. 

1 5 Digital video displays include large numbers of image frames that are played or rendered 

successively at frequencies of between 30 and 75 Hz. Each image frame is a still image formed from 
an array of pixels based on the display resolution of a particular system. As examples, VHS-based 
systems have display resolutions of 320x480 pixels, NTSC-based systems have display resolutions of 
720x486 pixels, and high-definition television (HDTV) systems under development have display 

20 resolutions of 1 360x 1 024 pixels. 

The amounts of raw digital information included in video sequences are massive. Storage 
and transmission of these amounts of video information is infeasible with conventional personal 
computer equipment. With reference to a digitized form of a relatively low resolution VHS image 
format having a 320x480 pixel resolution, a full-length motion picture of two hours in duration could 

25 correspond to 100 gigabytes of digital video information. By comparison, conventional compact 
optical disks have capacities of about 0.6 gigabytes, magnetic hard disks have capacities of 1-2 
gigabytes, and compact optical disks under development have capacities of up to 8 gigabytes. 

To address the limitations in storing or transmitting such massive amounts of digital video 
information, various video compression standards or processes have been established, including 

30 MPEG-1, MPEG-2, and H.26X. These conventional video compression techniques utilize 

similarities between successive image frames, referred to as temporal or interframe correlation, to 
provide interframe compression in which pixel-based representations of image frames are converted 
to motion representations. In addition, the conventional video compression techniques utilize 
similarities within image frames, referred to as spatial or intraframe correlation, to provide intraframe 

35 compression in which the motion representations within an image frame are further compressed. 
Intraframe compression is based upon conventional processes for compressing still images, such as 
discrete cosine transform (DCT) encoding. 

Although differing in specific implementations, the MPEG-1, MPEG-2, and H.26X video 
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compression standards are similar in a number of respects. The following description of the MPEG-2 
video compression standard is generally applicable to the others. 

MPEG-2 provides interframe compression and intraframe compression based upon square 
blocks or arrays of pixels in video images. A video image is divided into transformation blocks 
having dimensions of 16x16 pixels. For each transformation block T N in an image frame N, a search 
.s performed across the image of a next successive video frame N+l or immediately preceding image 
frame N-l (i.e., bidirectional^) to identify the most similar respective transformation blocks T N+1 or 

Ideally, and with reference to a search of the next successive image frame, the pixels in 
transformation blocks T N and T N+I are identical, even if the transformation blocks have different 
positions in their respective image frames. Under these circumstances, the pixel information in 
transformation block T N+ , is redundant to that in transformation block T N . Compression is achieved 
by substituting the positional translation between transformation blocks T N and T N+1 for the pixel 
information in transformation block T N+1 . In this simplified example, a single translation! vector 
(DX,DY) is designated for the video information associated with the 256 pixels in transformation 
block T,^,. 

Frequently, the video information (i.e., pixels) in the corresponding transformation blocks 
T N and T N+I are not identical. The difference between them is designated a transformation block 
error E, which often is significant. Although it is compressed by a conventional compression process 
such as discrete cosine transform (DCT) encoding, the transformation block error E is cumbersome 
and limits the extent (ratio) and the accuracy by which video signals can be compressed. 

Large transformation block errors E arise in block-based video compression methods for 
several reasons. The block-based motion estimation represents only translational motion between 
successive image frames. The only change between corresponding transformation blocks T N and 
T N+1 that can be represented are changes in the relative positions of the transformation blocks. A 
disadvantage of such representations is that full-motion video sequences frequently include complex 
motions other than translation, such as rotation, magnification and shear. Representing such complex 
motions with simple translational approximations results in the significant errors. 

Another aspect of video displays is that they typically include multiple image features or 
"video objects" that change or move relative to each other. Video objects may be distinct characters, 
articles, or scenery within a video display. With respect to a scene in a motion picture, for example, ' 
each of the characters (i.e., actors) and articles (i.e., props) in the scene could be a different object. 
Each of these objects is represented by a non-rectangular set of pixels in each frame of a video 
sequence. 

The relative motion between objects in a video sequence is another source of significant 
transformation block errors E in conventional video compression processes. Due to the regular- 
configuration and size of the transformation blocks, many of them encompass portions of different 
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objects. Relative motion between the objects during successive image frames can result in extremely 
low correlation (i.e., high transformation errors E) between corresponding transformation blocks. 
Similarly, the appearance of portions of objects in successive image frames (e.g., when a character 
turns) also introduces high transformation errors E. 

Conventional video compression methods appear to be inherently limited due to the size of 
transformation errors E. With the increased demand for digital video display capabilities, improved 
digital video compression processes are required. 

SUMMARY OF THE INVENTION 

The invention is a method for generating sprites used in object-based video coding systems. 
In contrast to conventional video coding systems, object based video coding systems code the video 
objects in a video sequence separately. A "sprite" in object-based video coding is a representative 
image of a video object collected from a video sequence. The process of generating a sprite refers to 
building a sprite and computing the motion parameters from video objects in a video sequence so that 
ihc video object for each frame can be reconstructed from the sprite and the motion parameters. 

The sprite generation method includes three principal steps. The method estimates the 
relative motion of a video object in the current frame relative to a sprite or reference video object 
constructed for a previous frame. The result of this step, which is referred to as global motion 
estimation, is a set of motion parameters that define the motion of the video object in the current 
frame relative to the sprite or reference object. Next, the method transforms or "warps" the video 
object for the current frame into the sprite or reference object using the motion parameters. 
Specifically, for each pixel in the video object, the method computes a corresponding position in the 
sprite or reference object by transforming each pixel to the coordinate space of the sprite. Finally, 
the method blends the current video object with the sprite or reference object. In blending the sprite 
and video object, the method combines pixels in the video object with pixels of the sprite located at 
corresponding positions as determined by the motion parameters. This process is repeated for the 
video object of each frame. In summary, the method incrementally constructs a sprite by blending 
the pixels of the video object from each frame with corresponding pixels in the sprite (or reference 
object). 

One aspect of the invention is a method for blending a video object with a sprite or 
reference object using rounding average. The method uses rounding average so that each video 
object has substantially the same contribution to the sprite. When blending each video object to the 
previously constructed sprite, the method weights the sprite pixels in proportion to the number of 
video objects from which it is constructed. This ensures that the video objects from each frame 
supply substantially the same contribution to the sprite. 

Another aspect of the invention is a method for performing global motion estimation for 
each video object using the motion of the object from the previous frame as a starting point. In one 
implementation of the. motion estimation component, the method computes the motion coefficients 
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that map the current video object into the previously constructed sprite by minimizing the intensity 
errors over all corresponding pairs of pixels in both the video object and the previously constructed 
sprite. This component includes a search for a set of motion coefficients that transform the pixels in 
the video object to corresponding pixels in the sprite to minimize the sum of the errors between the 
pixels of the video object and the corresponding pixels of the sprite. By starting with the motion 
coefficients from the previous frame, the search for motion coefficients in the current frame is more 
likely to find a set of motion coefficients that minimizes the error between the pixels in the video 
object and the corresponding pixels in the sprite or reference object. 

In computing the motion parameters for each frame, the method uses the masks of the object 
and the sprite (or reference object) to ensure that it estimates motion based only on pixels in the video 
object that map to pixel locations inside the sprite or reference object. This approach reduces the 
error in the global motion estimation process and ensures that the motion of a video object is based 
solely on the motion of the "object relative to the sprite. 

The sprite generation method summarized above can be implemented in a stand-alone 
module that generates off-line static sprites, as well as a module integrated with an encoder and 
decoder to generate on-line, dynamic sprites. The invention can be implemented in a software 
module executed in a computer or in special purpose video coding hardware. 

Further features and advantages of the invention will become apparent with reference to the 
following detailed description and accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a computer system that may be used to implement a method and 
apparatus embodying the invention. 

Figs. 2A and 2B are simplified representations of a display screen of a video display device 
showing two successive image frames corresponding to a video signal. 

Fig. 3A is a generalized functional block diagram of a video compression encoder process 
for compressing digitized video signals representing display motion in video sequences of multiple 
image frames. Fig. 3B is a functional block diagram of a master object encoder process according to 
this invention. 

Fig. 4 is a functional block diagram of an object segmentation process for segmenting 
selected objects from an image frame of a video sequence. 

Fig. 5 A is simplified representation of display screen of the video display device of Fig. 2A, 
and Fig. 5B is an enlarged representation of a portion of the display screen of Fig. 5A. 

Fig. 6 is a functional block diagram of a polygon match process for determining a motion 
vector for corresponding pairs of pixels in corresponding objects in successive image frames. 

Figs. 7A and 7B are simplified representations of a display screen showing two successive 
image frames with two corresponding objects. 
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Fig. 8 is a functional block diagram of an alternative pixel block correlation process. 

Fig. 9A is a schematic representation of a first pixel block used for identifying 
corresponding pixels in different image frames. Fig. 9B is a schematic representation of an array of 
pixels corresponding to a search area in a prior image frame where corresponding pixels are sought. 
5 Figs. 9C-9G are schematic representations of the first pixel block being scanned across the pixel 
array of FIG. 9B to identify corresponding pixels. 

Fig. 1 OA is a schematic representation of a second pixel block used for identifying 
corresponding pixels in different image frames. Figs* 1 OB- 1 OF are schematic representations of the 
second pixel block being scanned across the pixel array of Fig. 9B to identify corresponding pixels. 
10 Pig- 11 A is a schematic representation of a third pixel block used for identifying 

corresponding pixels in different image frames. Figs. 1 1B-1 IF are schematic representations of the 
third pixel block being scanned across the pixel array of Fig. 9B. 

Fig. 12 is a function block diagram of a multi-dimensional transformation method that 
includes generating a mapping between objects in first and second successive image frames and 
1 5 quantizing the mapping for transmission or storage. 

Fig. 13 is a simplified representation of a display screen showing the image frame of Fig. 7B 
for purposes of illustrating the multi-dimensional transformation method of Fig. 12. 

Fig. 14 is an enlarged simplified representation showing three selected pixels of a 
transformation block used in the quantization of affine transformation coefficients determined by the 
20 method of Fig. 12. 

Fig. 15 is a functional block diagram of a transformation block optimization method utilized 
in an alternative embodiment of the multi-dimensional transformation method of Fig. 12. 

Fig. 16 is a simplified fragmentary representation of a display screen showing the image 
frame of Fig. 7B for purposes of illustrating the transformation block optimization method of Fig. 15. 
25 Figs. 17A and 17B are a functional block diagram of a precompression extrapolation 

method for extrapolating image features of arbitrary configuration to a predefined configuration to 
facilitate compression. 

Figs. 1 8A-18D are representations of a display screen on which a simple object is rendered 
to show various aspects of the extrapolation method of Fig. 14. 
30 Figs. 19A and 19B are functional block diagrams of an encoder method and a decoder 

method, respectively, employing a Laplacian pyramid encoder method in accordance with this 
invention. 

Figs. 20A-20D are simplified representations of the color component values of an arbitrary 
set or array of pixels processed according to the encoder process of Fig. 1 9A. 
35 Fig. 21 is a functional block diagram of a motion vector encoding process according to this 

invention. 

Fig. 22 is a functional block diagram of an alternative quantized object encoder-decoder 
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process. 

Fig. 23A is a generalized functional block diagram of a video compression decoder process 
matched to the encoder process of Fig. 3. Fig. 23B is a functional diagram of a master object 
decoder process according to this invention. 

Fig. 24A is a diagrammatic representation of a conventional chain code format. Fig. 24B is 
a simplified representation of an exemplary contour for processing with the chain code format of Fig 
24A. S ' 

Fig. 25A is a functional block diagram of a chain coding process of this invention. 
Fig. 25B is a diagrammatic representation of a chain code format. 

Fig. 25C is a diagrammatic representation of special case chain code modifications used in 
the process of Fig. 25A. 

Fig. 26 is a functional block diagram of a sprite generating or encoding process. 
Figs. 27A and 27B are respective first and second objects defined by bitmaps and showing 
grids of triangles superimposed over the objects in accordance with the process of Fig. 26. 

Fig. 28 is a functional block diagram of a sprite decoding process corresponding to the 
encoding process of Fig. 26. 

Fig. 29 is a functional block diagram of a simplified decoding process for fully-defined 

objects. 

Fig. 30 illustrates an example showing how to generate a sprite from a video sequence. 
Fig. 3 1 is a block diagram illustrating a sprite generator. 

Fig. 32 is a flow diagram illustrating a method for generating sprites for a object-based 
coding system. 

Fig. 33 is a flow diagram illustrating a method for global motion estimation. 

25 DETAILED DESCRIPTION 

Referring to Fig. 1, an operating environment for the preferred embodiment of the present 
invention is a computer system 20, either of a general purpose or a dedicated type, that comprises at 
least one high speed processing unit (CPU) 22, in conjunction with a memory system 24, an input 
device 26, and an output device 28. These elements are interconnected by a bus structure 30. 

30 The illustrated CPU 22 is of familiar design and includes an ALU 32 for performing 

computations, a collection of registers 34 for temporary storage of data and instructions/and a 
control unit 36 for controlling operation of the system 20. CPU 22 may be a processor having any of 
a variety of architectures including Alpha from Digital, MIPS from MIPS Technology, NEC, IDT, 
Siemens, and others, x86 from Intel and others, including Cyrix, AMD, and Nexgen, and the 

35 PowerPC from IBM and Motorola. 

The memory system 24 includes main memory 38 and secondary storage 40. Illustrated 
main memory 38 takes the form of 16 megabytes of semiconductor RAM memory. Secondary 
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storage 40 takes the form of long term storage, such as ROM, optical or magnetic disks, flash 
memory, or tape. Those skilled in the art will appreciate that memory system 24 may comprise many 
other alternative components. 

The input and output devices 26, 28 are also familiar. The input device 26 can comprise a 
5 keyboard, a mouse, a physical transducer (e.g., a microphone), etc, The output device 28 can 
comprise a display, a printer, a transducer (e.g. a speaker), etc. Some devices, such as a network 
interface or a modem, can be used as input and/or output devices. 

As is familiar to those skilled in the art, the computer system 20 further includes an 
operating system and at least one application program. The operating system is the set of software 
1 0 which controls the computer system's operation and the allocation of resources. The application 
program is the set of software that performs a task desired by the user, making use of computer 
resources made available through the operating system. Both are resident in the illustrated memory 
system 24. 

In accordance with the practices of persons skilled in the art of computer programming, the 

1 5 present invention is described below with reference to symbolic representations of operations that are 
performed by computer system 20, unless indicated otherwise. Such operations are sometimes 
referred to as being computer-executed. It will be appreciated that the operations which are 
symbolically represented include the manipulation by CPU 22 of electrical signals representing data 
bits and the maintenance of data bits at memory locations in memory system 24, as well as other 

20 processing of signals. The memory locations where data bits are maintained are physical locations 
that have particular electrical, magnetic, or optical properties corresponding to the data bits. 

Figs. 2A and 2B are simplified representations of a display screen 50 of a video display 
device 52 (e.g., a television or a computer monitor) showing two successive image frames 54a and 
54b of a video image sequence represented electronically by a corresponding video signal. Video 

25 signals may be in any of a variety of video signal formats including analog television video formats 
such as NTSC, PAL, and SECAM, and pixelated or digitized video signal formats typically used in 
computer displays, such as VGA, CGA, and EGA. Preferably, the video signals corresponding to 
image frames are of a digitized video signal format, either as originally generated or by conversion 
from an analog video signal format, as is known in the art. 

30 Image frames 54a and 54b each include a rectangular solid image feature 56 and a pyramid 

image feature 58 that are positioned over a background 60. Image features 56 and 58 in image 
frames 54 a and 54b have different appearances because different parts are obscured and shown. For 
purposes of the following description, the particular form of an image feature in an image frame is 
referred to as an object or, alternatively, a mask. Accordingly, rectangular solid image feature 56 is 

35 shown as rectangular solid objects 56a and 56b in respective image frames 54a and 54b, and pyramid 
image feature 58 is shown as pyramid objects 58a and 58b in respective image frames 54a and 54b. 
Pyramid image feature 58 is shown with the same position and orientation in image frames 
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54a and 54b and would "appear" to be motionless when shown in the video sequence. Rectangular 
solid 56 is shown in frames 54a and 54b with a different orientation and position relative to pyramid 
58 and would "appear" to be moving and rotating relative to pyramid 58 when shown in the video 
sequence. These appearances of image features 58 and 60 are figurative and exaggerated. The 
image frames of a video sequence typically are displayed at rates in the range of 30-80 Hz. Human 
perception of video motion typically requires more than two image frames. Image frames 54a and 
54b provide, therefore, a simplified representation of a conventional video sequence for purposes of 
illustrating the present invention. Moreover, it will be appreciated that the present invention is in no 
W ay limited to such simplified video images, image features, or sequences and, to the contrary, is 
applicable to video images and sequences of arbitrary complexity. 
Video Compression Encoder Process Overview 

Fig. 3A is a generalized functional block diagram of a video compression encoder process 
64 for compressing digitized video signals representing display motion in video sequences of 
multiple image frames. Compression of video information (i.e., video sequences or signals) can 
provide economical storage and transmission of digital video information in applications that include 
for example, interactive or digital television and multimedia computer applications. For purposes of ' 
brevity, the reference numerals assigned to function blocks of encoder process 64 are used 
interchangeably in reference to the results generated by the function blocks. 

Conventional video compression techniques utilize similarities between successive image 
frames, referred to as temporal or interframe correlation, to provide interframe compression in which 
pixel-based representations of image frames are converted to motion representations. In addition, 
conventional video compression techniques utilize similarities within image frames, referred to as 
spatial or intraframe correlation, to provide intraframe compression in which the motion 
representations within an image frame are further compressed. 

In such conventional video compression techniques, including MPEG-1, MPEG-2, and 
H.26X, the temporal and spatial correlations are determined relative to simple translations of fixed, 
regular (e.g., square) arrays of pixels. Video information commonly includes, however, arbitrary 
video motion that cannot be represented accurately by translating square arrays of pixels. As a 
consequence, conventional video compression techniques typically include significant error 
components that limit the compression rate and accuracy. 

In contrast, encoder process 64 utilizes object-based video compression to improve the 
accuracy and versatility of encoding interframe motion and intraframe image features. Encoder 
process 64 compresses video information relative to objects of arbitrary configurations, rather than 
fixed, regular arrays of pixels. This reduces the error components and thereby improves the 
compression efficiency and accuracy. As another benefit, object-based video compression provides 
interactive video editing capabilities for processing compressed video information. 

Referring to Fig. 3A, function block 66 indicates that user-defined objects within image 
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frames of a video sequence are segmented from other objects within the image frames. The objects 
may be of arbitrary configuration and preferably represent distinct image features in a display image. 
Segmentation includes identifying the pixels in the image frames corresponding to the objects. The 
user-defined objects are defined in each of the image frames in the video sequence. In Figs. 2A and 
5 2B, for example, rectangular solid objects 56a and 56b and pyramid objects 58a and 58b are 
separately segmented. 

The segmented objects are represented by binary or multi-bit (e.g., 8-bit) "alphachannel" 
masks of the objects. The object masks indicate the size, configuration, and position of an object on 
a pixel-by-pixel basis. For purposes of simplicity, the following description is directed to binary 
10 masks in which each pixel of the object is represented by a single binary bit rather than the typical 
24-bits (i.e., 8 bits for each of three color component values). Multi-bit (e.g., 8-bit) masks also have 
been used. 

Function block 68 indicates that "feature points" of each object are defined by a user. 
Feature points preferably are distinctive features or aspects of the object. For example, comers 70a- 
15 70c and corners 72a-72c could be defined by a user as feature points of rectangular solid 56 and 
pyramid 58, respectively. The pixels corresponding to each object mask and its feature points in 
each image frame are stored in an object database included in memory system 24. 

Function block 74 indicates that changes in the positions of feature points in successive 
image frames are identified and trajectories determined for the feature points between successive 
20 image frames. The trajectories represent the direction and extent of movement of the feature points. 
Function block 76 indicates that trajectories of the feature points in the object between prior frame N- 
1 and current frame N also is retrieved from the object data base. 

Function block 78 indicates that a sparse motion transformation is determined for the object 
between prior frame N-l and current frame N. The sparse motion transformation is based upon the 
25 feature point trajectories between frames N-l and N. The sparse motion transformation provides an 
approximation of the change of the object between prior frame N- 1 and current frame N. 

Function block 80 indicates that a mask of an object in a current frame N is retrieved from 
the object data base in memory system 24. 

Function block 90 indicates that a quantized master object or "sprite" is formed from the 
30 objects or masks 66 corresponding to an image feature in an image frame sequence and feature point 
trajectories 74. The master object preferably includes all of the aspects or features of an object as it 
is represented in multiple frames. With reference to Figs. 2A and 2B, for example, rectangular solid 
56 in frame 54b includes a side 78b not shown in frame 54a. Similarly, rectangular solid 56 includes 
a side 78a in frame 54a not shown in frame 54b. The master object for rectangular solid 56 includes 
35 both sides 78a and 78b. 

Sparse motion transformation 78 frequently will not provide a complete representation of 
the change in the object between frames N-l and N. For example, an object in a prior frame N-l, 
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such as rectangular object 54a, might not include all the features of the object in the current frame N, 
such as side 78b of rectangular object 54b. 

To improve the accuracy of the transformation, therefore, an intersection of the masks of the 
object in prior frame N-l and current frame N is determined, such as by a logical AND function as is 
known in the art. The mask of the object in the current frame N is subtracted from the resulting 
intersection to identify any portions or features of the object in the current frame N not included in 
the object in the prior frame N-l (e.g., side 78b of rectangular object 54b, as described above). The 
newly identified portions of the object are incorporated into master object 90 so that it includes a 
complete representation of the object in frames N-l and N. 

Function block 96 indicates that a quantized form of an object 98 in a prior frame N-l (e.g., 
rectangular solid object 56a in image frame 54a) is transformed by a dense motion transformation to 
provide a predicted form of the object 102 in a current frame N (e.g., rectangular solid object 56b in 
image frame 54b). This transformation provides object-based interframe compression. 

The dense motion transformation preferably includes determining an affine transformation 
between quantized prior object 98 in frame N-l and the object in the current frame N and applying 
the affine transformation to quantized prior object 98. The preferred affine transformation is 
represented by affine transformation coefficients 104 and is capable of describing translation, 
rotation, magnification, and shear. The affine transformation is determined from a dense motion 
estimation, preferably including a pixel-by-pixel mapping, between prior quantized object 98 and the 
20 object in the current frame N. 

Predicted current object 102 is represented by quantized prior object 98, as modified by 
dense motion transformation 96, and is capable of representing relatively complex motion, together 
with any new image aspects obtained from master object 90. Such object-based representations are 
relatively accurate because the perceptual and spatial continuity associated with objects eliminates 
errors arising from the typically changing relationships between different objects in different image ■ 
frames. Moreover, the object-based representations allow a user to represent different objects with 
different levels of resolution to optimize the relative efficiency and accuracy for representing objects 
of varying complexity. 

Function block 106 indicates that for image frame N, predicted current object 102 is 
30 subtracted from original object 1 08 for current frame N to determine an estimated error 1 1 0 in 

predicted object 102. Estimated error 1 10 is a compressed representation of current object 108 in 
image frame N relative to quantized prior object 98. More specifically, current object 108 may be 
decoded or reconstructed from estimated error 1 10 and quantized prior object 98. 

Function block 1 12 indicates that estimated error 1 10 is compressed or "coded" by a 
conventional "lossy" still image compression method such as lattice subband (wavelet) compression 
or encoding as described in Multirate Systems and Filter Banks by Vaidyanathan, PTR Prentice-Hall, 
Inc., Englewood Cliffs, New Jersey, (1993) or discrete cosine transform (DCT) encoding as 
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described in JPEG: Still Image Data Compression Standard by Pennebaker et al., Van Nostrand 
Reinhold, New York (J 993). 

As is known in the art, "lossy" compression methods introduce some data distortion to 
provide increased data compression. The data distortion refers to variations between the original data 
5 before compression and the data resulting after compression and decompression. For purposes of 
illustration below, the compression or encoding of function block 102 is presumed to be wavelet 
encoding. 

Function block 1 14 indicates that the wavelet encoded estimated error from function block 

1 12 is further compressed or "coded" by a conventional "lossless" still image compression method to 
10 form compressed data 1 16. A preferred conventional "lossless" still image compression method is 

entropy encoding as described in JPEG: Still Image Data Compression Standard by Pennebaker et al. 

As is known in the art, "lossless" compression methods introduce no data distortion. 

An error feedback loop 1 18 utilizes the wavelet encoded estimated error from function 

block 1 12 for the object in frame N to obtain a prior quantized object for succeeding frame N+l . As 
15 an initial step in feedback loop 1 1 8, function block 120 indicates that the wavelet encoded estimated 

error from function block 1 12 is inverse wavelet coded, or wavelet decoded, to form a quantized 

error 122 for the object in image frame N. 

The effect of successively encoding and decoding estimated error 1 10 by a lossy still image 

compression method is to omit from quantized error 122 video information that is generally 
20 imperceptible by viewers. This information typically is of higher frequencies. As a result, omitting 

such higher frequency components typically can provide image compression of up to about 200% 

with only minimal degradation of image quality. 

Function block 124 indicates that quantized error 122 and predicted object 102, both for 

image frame N, are added together to form a quantized object 126 for image frame N. After a timing 
25 coordination delay 128, quantized object 126 becomes quantized prior object 98 and is used as the 

basis for processing the corresponding object in image frame N+l. 

Encoder process 64 utilizes the temporal correlation of corresponding objects in successive 

image frames to obtain improved interframe compression, and also utilizes the spatial correlation 

within objects to obtain accurate and efficient intraframe compression. For the interframe 
30 compression, motion estimation and compensation are performed so that an object defined in one 

frame can be estimated in a successive frame. The motion-based estimation of the object in the 

successive frame requires significantly less information than a conventional block-based 

representation of the object. For the intraframe compression, an estimated error signal for each 

object is compressed to utilize the spatial correlation of the object within a frame and to allow 
35 different objects to be represented at different resolutions. Feedback loop 118 allows objects in 

subsequent frames to be predicted from fully decompressed objects, thereby preventing accumulation 

of estimation error. 
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Encoder process 64 provides as an output a compressed or encoded representation of a 
digitized video signal representing display motion in video sequences of multiple image frames. The 
compressed or encoded representation includes object masks 66, feature points 68, affine transform 
coefficients 104. and compressed error data 1 16. The encoded representation may be stored or 
5 transmitted, according to the particular application in which the video information is used. 

Fig. 3B is a functional block diagram of a master object encoder process 130 for encoding 
or compressing master object 90. Function block 132 indicates that master object 90 is compressed 
or coded by a conventional "lossy" still image compression method such as lattice subband (wavelet) 
compression or discrete cosine transform (DCT) encoding. Preferably, function block 132 employs 
10 wavelet encoding. 

Function block 134 indicates that the wavelet encoded master object from function block 
132 is further compressed or coded by a conventional "lossless" still image compression method to 
form compressed master object data 1 36. A preferred conventional lossless still image compression 
method is entropy encoding. 

Encoder process 130 provides as an output compressed master object 136. Together with 
the compressed or encoded representations provided by encoder process 64, compressed master 
object 136 may be decompressed or decoded after storage or transmission to obtain a video sequence 
of multiple image frames. 

Encoder process 64 is described with reference to encoding video information 
corresponding to a single object within an image frame. As shown in Figs. 2A and 2B and indicated 
above, encoder process 64 is performed separately for each of the objects (e.g., objects 56 and 58 of 
F Ig s. 2A and 2B) in an image frame. Moreover, many video images include a background over 
which arbitrary numbers of image features or objects are rendered. Preferably, the background is 
processed as an object according to this invention after all user-designated objects are processed. 

Processing of the objects in an image frame requires that the objects be separately identified. 
Preferably, encoder process 64 is applied to the objects of an image frame beginning with the 
forward-most object or objects and proceeding successively to the back-most object (e.g., the 
background). The compositing of the encoded objects into a video image preferably proceeds from 
the rear-most object (e.g., the background) and proceeds successively to the forward-most object 
(e.g., rectangular solid 56 in Figs. 2A and 2B). The layering of encoding objects may be 
communicated as distinct layering data associated with the objects of an image frame or, 
alternatively, by transmitting or obtaining the encoded objects in a sequence corresponding to the 
layering or compositing sequence. 
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Object Segmentation And Tracking 

In a preferred^mbodiment, the segmentation of objects within image frames referred to in 
function block 66 allows interactive segmentation by users. The object segmentation of this 
invention provides improved accuracy in segmenting objects and is relatively fast and provides users 
5 with optimal flexibility in defining objects to be segmented. 

Fig. 4 is a functional block diagram of an object segmentation process 140 for segmenting 
selected objects from an image frame of a video sequence. Object segmentation according to process 
140 provides a perceptual grouping of objects that is accurate and quick and easy for users to define. 

Fig. 5 A is simplified representation of display screen 50 of video display device 52 showing 
1 0 image frame 54a and the segmentation of rectangular solid object 56a. In its rendering on display 
screen 50, rectangular solid object 56a includes an object perimeter 142 (shown spaced apart from 
object 56a for clarity) that bounds an object interior 144. Object interior 144 refers to the outline of 
object 56a on display screen 50 and in general may correspond to an inner surface or, as shown, an 
outer surface of the image feature. Fig. 5B is an enlarged representation of a portion of display 
1 5 screen 50 showing the semi-automatic segmentation of rectangular solid object 56a. The following 
description is made with specific reference to rectangular solid object 56a, but is similarly applicable 
to each object to be segmented from an image frame. 

Function block 146 indicates that a user forms within object interior 144 an interior outline 
148 of object perimeter 142. The user preferably forms interior outline 148 with a conventional 
20 pointer or cursor control device, such as a mouse or trackball. Interior outline 148 is formed within a 
nominal distance 150 from object perimeter 142. Nominal distance 150 is selected by a user to be 
sufficiently large that the user can form interior outline 148 relatively quickly within nominal 
distance 150 of perimeter 142. Nominal distance 150 corresponds, for example, to between about 4 
and 10 pixels. 

25 Function block 146 is performed in connection with a key frame of a video sequence. With 

reference to a scene in a conventional motion picture, for example, the key frame could be the first 
frame of the multiple frames in a scene. The participation of the user in this function renders object 
segmentation process 140 semi-automatic, but significantly increases the accuracy and flexibility 
with which objects are segmented. Other than for the key frame, objects in subsequent image frames 

30 are segmented automatically as described below in greater detail. 

Function block 152 indicates that interior outline 148 is expanded automatically to form an 
exterior outline 156. The formation of exterior outline 156 is performed as a relatively simple image 
magnification of outline 148 so that exterior outline 156 is a user-defined number of pixels from 
interior outline 148. Preferably, the distance between interior outline 148 and exterior outline 156 is 

35 approximately twice distance 150. 

Function block 158 indicates that pixels between interior outline 148 and exterior outline 
1 56 are classified according to predefined attributes as to whether they are within object interior 144, 
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thereby to identify automatically object perimeter 142 and a corresponding mask 80 of the type 
described with reference to Fig. 3A. Preferably, the image attributes include pixel color and position, 
but either attribute could be used alone or with other attributes. 

In the preferred embodiment, each of the pixels in interior outline 148 and exterior outline 
5 1 56 defines a "cluster center" represented as a five-dimensional vector in the form of (r, g, b, x, y). 
The terms r, g, and b correspond to the respective red, green, and blue color components associated 
with each of the pixels, and the terms x and y correspond to the pixel locations. The m-number of 

cluster center vectors corresponding to pixels in interior outline 148 are denoted as {I 0( I , l m ,}, 

and the n-number of cluster center vectors corresponding pixels in exterior outline 156 are denoted as 
10 {O 0 ,O„...,O„.,}. 

Pixels between the cluster center vectors Ii and O i are classified by identifying the vector to 
which each pixel is closest in the five-dimensional vector space. For each pixel, the absolute distance 
di and 4 to each of respective cluster center vectors I ; and O, is computed according to the following 
equations: 

15 di^^XV-r^g-g^-Pb-b^+w^ex-x^y-y^), 0 £i<m, 
d j =Wco 10 r( 3 r-r j 3 + 3 g-g j 3 + 3 b-b j 5 )+w coord ('x-x j s +Vy/), 0 £j<n, 

in which w Mlor and W(lKm) are weighting factors for the respective color and pixel position information. 
Weighting factors Wcolor and Wcoord are of values having a sum of 1 and otherwise selectable by a user. 
Preferably, weighting factors Wcolor and w^ are of an equal value of 0.5. Each pixel is associated 
with object interior 144 or exterior according to the minimum five-dimensional distance to one of the 
cluster center vectors Ij and Oj. 

Function block 162 indicates that a user selects at least two, and preferable more (e.g. 4 to 
6), feature points in each object of an initial or key frame. Preferably, the feature points are relatively 
distinctive aspects of the object. With reference to rectangular solid image feature 56, for example, 
25 comers 70a-70c could be selected as feature points. 

Function block 1 64 indicates that a block 1 66 of multiple pixels centered about each 
selected feature point (e.g., corners 70a-70c) is defined and matched to a corresponding block in a 
subsequent image frame (e.g., the next successive image frame). Pixel block 166 is user defined, but 
preferably includes a 32 x 32 pixel array that includes only pixels within image interior 144. Any 
30 pixels 168 (indicated by cross-hatching) of pixel block 166 falling outside object interior 144 as 
determined by function block 158 (e.g., comers 70b and 70c) are omitted. Pixel blocks 166 are 
matched to the corresponding pixel blocks in the next image frame according to a minimum absolute 
error identified by a conventional block match process or a polygon match process, as described 
below in greater detail. 

Function block 170 indicates that a sparse motion transformation of an object is determined 
from the corresponding feature points in two successive image frames. Function block 172 indicates 
that mask 80 of the current image frame is transformed according to the sparse motion transformation 
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to provide an estimation of the mask 80 for the next image frame. Any feature point in a current 
frame not identified in_a successive image frame is disregarded. 

Function block 174 indicates that the resulting estimation of mask 80 for the next image 
frame is delayed by one frame, and functions as an outline 176 for a next successive cycle. 
5 Similarly, function block 1 78 indicates that the corresponding feature points also are delayed by one 
frame, and utilized as the initial feature points 180 for the next successive frame. 
Polygon Match Method 

Fig. 6 is a functional block diagram of a polygon match process 200 for determining a 
motion vector for each corresponding pair of pixels in successive image frames. Such a dense 
1 0 motion vector determination provides the basis for determining the dense motion transformations 96 
of Fig. 3 A. 

Polygon match process 200 is capable of determining extensive motion between successive 
image frames like the conventional block match process. In contrast to the conventional block match 
process, however, polygon match process 200 maintains its accuracy for pixels located near or at an 
1 5 object perimeter and generates significantly less error. A preferred embodiment of polygon match 
method 200 has improved computational efficiency. 

Polygon block method 200 is described with reference to Figs. 7A and 7B, which are 
simplified representations of display screen 50 showing two successive image frames 202a and 202b 
in which an image feature 204 is rendered as objects 204a and 204b, respectively. 
20 Function block 206 indicates that objects 204a and 204b for image frames 202a and 202b 

are identified and segmented by, for example, object segmentation method 140. 

Function block 208 indicates that dimensions are determined for a pixel block 210b (e.g., 
15x15 pixels) to be applied to object 204b and a search area 212 about object 204a. Pixel block 210b 
defines a region about each pixel in object 204b for which region a corresponding pixel block 2 1 0a is 
25 identified in object 204a. Search area 212 establishes a region within which corresponding pixel 
block 210a is sought. Preferably, pixel block 210b and search area 212 are right regular arrays of 
pixels and of sizes defined by the user. 

Function block 214 indicates that an initial pixel 216 in object 204b is identified and 
designated the current pixel. Initial pixel 216 may be defined by any of a variety of criteria such as, 
30 for example, the pixel at the location of greatest vertical extent and minimum horizontal extent. With 
the pixels on display screen 50 arranged according to a coordinate axis 220 as shown, initial pixel 
2 1 6 may be represented as the pixel of object 2 1 4b having a maximum y-coordinate value and a 
minimum x-coordinate value. 

Function block 222 indicates that pixel block 210b is centered at and extends about the 
35 current pixel. 

Function block 224 represents an inquiry as to whether pixel block 210b includes pixels that 
are not included in object 204b (e.g., pixels 226 shown by cross-hatching in Fig. 7B). This inquiry is 
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made with reference to the objects identified according to function block 206. Whenever pixels 
within pixel block 210b positioned at the current pixel fall outside object 204b, function block 224 
proceeds to function block 228 and otherwise proceeds to function block 232. 

Function block 228 indicates that pixels of pixel block 2 10b falling outside object 204b 
(e.g., pixels 226) are omitted from the region defined by pixel block 210b so that it includes only 
pixels within object 204b. As a result, pixel block 210b defines a region that typically would be of a 
polygonal shape more complex than the originally defined square or rectangular region. 

Function block 232 indicates that a pixel in object 204a is identified as corresponding to the 
current pixel in object 204b. The pixel in object 204a is referred to as the prior corresponding pixel. 
Preferably, the prior corresponding pixel is identified by forming a pixel block 210a about each pixel 
in search area 212 and determining a correlation between the pixel block 210a and pixel block 210b 
about the current pixel in object 204b. Each correlation between pixel blocks 210a and 210b may be 
determined, for example, a means absolute error. The prior corresponding pixel is identified by 
identifying the pixel block 2 1 0a in search area 212 for which the mean absolute error relative to pixel 
block 210b is minimized. A mean absolute error E for a pixel block 210a relative to pixel block 

2 1 0b may be determined as: 
m-1 n-1 

E = SS(Vr ij -+Vgu' 3+3 Vb ij -), 
i=0 j=0 

20 in which the terms r s , gij , and b, correspond to the respective red, green, and blue color components 
associated with each of the pixels in pixel block 210b and the terms r„\ gij ', and y correspond to the 
respective red, green, and blue color components associated with each of the pixels in pixel block 
210a. 
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As set forth above, the summations for the mean absolute error E imply pixel blocks having 
pixel arrays having mxn pixel dimensions. Pixel blocks 210b of polygonal configuration are 
accommodated relatively simply by, for example, defining zero values for the color components of 
all pixels outside polygonal pixel blocks 210b. 

Function block 234 indicates that a motion vector MV between each pixel in object 204b 
and the corresponding prior pixel in object 204a is determined. A motion vector is defined as the 
difference between the locations of the pixel in object 204b and the corresponding prior pixel in 
object 204a: 

MV= eW.^-y,"), 

in which the terms x, and yj correspond to the respective x- and y-coordinate positions of the pixel in 
pixel block 210b, and the terms x k * and y, 'correspond to the respective x- and y-coordinate positions 
35 of the corresponding prior pixel in pixel block 2 1 0a. 

Function block 236 represents an inquiry as to whether object 204b includes any remaining 



WO 98/59497 



- 17- 



PCT/US98/I3009 



pixels. Whenever object 204b includes remaining pixels, function block 236 proceeds to function 
block 238 and otherwise proceeds to end block 240. 

Function block 238 indicates that a next pixel in object 204b is identified according to a 
predetermined format or sequence. With the initial pixel selected as described above in reference to 
5 function block 214, subsequent pixels may be defined by first identifying the next adjacent pixel in a 
row (i.e., of a common y-coordinate value) and, if object 204 includes no other pixels in a row, 
proceeding to the first or left-most pixel (i.e., of minimum x-coordinate value) in a next lower row. 
The pixel so identified is designated the current pixel and function block 238 returns to function 
block 222. 

10 Polygon block method 200 accurately identifies corresponding pixels even if they are 

located at or near an object perimeter. A significant source of error in conventional block matching 
processes is eliminated by omitting or disregarding pixels of pixel blocks 210b falling outside object 
204b. Conventional block matching processes rigidly apply a uniform pixel block configuration and 
are not applied with reference to a segmented object. The uniform block configurations cause 

1 5 significant errors for pixels adjacent the perimeter of an object because the pixels outside the object 
can undergo significant changes as the object moves or its background changes. With such 
extraneous pixel variations included in conventional block matching processes, pixels in the vicinity 
of an object perimeter cannot be correlated accurately with the corresponding pixels in prior image 
frames. 

20 For each pixel in object 204b, a corresponding prior pixel in object 204a is identified by 

comparing pixel block 210b with a pixel block 210a for each of the pixels in prior object 204a. The 
corresponding prior pixel is the pixel in object 204a having the pixel block 210a that best correlates 
to pixel block 210b. If processed in a conventional manner, such a determination can require 
substantial computation to identify each corresponding prior pixel. To illustrate, for pixel blocks 

25 having dimensions of nxn pixels, which are significantly smaller than a search area 212 having 
dimensions of mxm pixels, approximately n 2 xm 2 calculations are required to identify each 
corresponding prior pixel in the prior object 204a. 
Pixel Block Correlation Process 

Fig. 8 is a functional block diagram of a modified pixel block correlation process 260 that 

30 preferably is substituted for the one described with reference to function block 232. Modified 

correlation process 260 utilizes redundancy inherent in correlating pixel blocks 210b and 210a to 
significantly reduce the number of calculations required. 

Correlation process 260 is described with reference to Figs. 9A-9G and 10A-10G, which 
schematically represent arbitrary groups of pixels corresponding to successive image frames 202a 

35 and 202b. In particular, Fig. 9A is a schematic representation of a pixel block 262 having 

dimensions of 5x5 pixels in which each letter corresponds to a different pixel. The pixels of pixel 
block 262 are arranged as a right regular array of pixels that includes distinct columns 264. Fig. 9B 
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represents an array of pixels 266 having dimensions of qxq pixels and corresponding to a search area 
212 in a prior image frame 202a. Each of the numerals in Fig. 9B represents a different pixel 
Although described with reference to a conventional right regular pixel block 262, correlation 
process 260 is similarly applicable to polygonal pixel blocks of the type described with reference to 
5 polygon match process 200. 

Function block 268 indicates that an initial pixel block (e.g., pixel block 262) is defined with 
respect to a central pixel M and scanned across a search area 2 12 (e.g., pixel array 266) generally in a 
raster pattern (partly shown in Fig. 7A) as in a conventional block match process. Figs. 9C-9G 
schematically illustrate five of the approximately q 2 steps in the block matching process between 
1 0 pixel block 262 and pixel array 266. 

Although the scanning of pixel block 262 across pixel .array 266 is performed in a 
conventional manner, computations relating to the correlation between them are performed 
differently according to this invention. In particular, a correlation (e.g., a mean absolute error) is 
determined and stored for each column 264 of pixel block 262 in each scan position. The correlation 
that is determined and stored for each column 264 of pixel block 262 in each scanned position is 
referred to as a column correlation 270, several of which are symbolically indicated in Figs. 9C-9G 
by referring to the correlated pixels. To illustrate, Fig. 9C shows a column correlation 270(1) that is 
determined for the single column 264 of pixel block 262 aligned with pixel array 266. Similarly, Fig. 
9D shows column correlations 270(2) and 270(3) that are determined for the two columns 264 of 
pixel block 262 aligned with pixel array 266. Figs. 9E-9G show similar column correlations with 
pixel block 262 at three exemplary subsequent scan positions relative to pixel array 266. 

The scanning of initial pixel block 262 over pixel array 266 provides a stored array or 
database of column correlations. With pixel block 262 having r-number of columns 264, and pixel 
array 266 having qxq pixels, the column correlation database includes approximately rq 2 number of 
column correlations. This number of column correlations is only approximate because pixel block 
262 preferably is initially scanned across pixel array 266 such that pixel M is aligned with the first 
row of pixels in pixel array 266. 

The remaining steps beginning with the one indicated in Fig. 9C occur after two complete 
scans of pixel block 262 across pixel array 266 (i.e., with pixel M aligned with the first and second 
30 rows of pixel array 266). 

Function block 274 indicates that a next pixel block 276 (Fig. 10A) is defined from, for 
example, image frame 202b with respect to a central pixel N in the same row as pixel M. Pixel block 
276 includes a column 278 of pixels not included in pixel block 262 and columns 280 of pixels 
included in pixel block 262. Pixel block 276 does not include a column 282 (Fig. 9A) that was 
included in pixel block 262. Such an incremental definition of next pixel block 276 is substantially 
the same as that used in conventional block matching processes. 

Function block 284 indicates that pixel block 276 is scanned across pixel array 266 in the 



20 



25 



35 



WO 98/59497 



- 19- 



PCT7US98/13009 



manner described above with reference to function block 268. As with Figs. 9C-9G, Figs. 10B-10G 
represent the scanning^f pixel block 276 across pixel array 266. 

Function block 286 indicates that for column 278 a column correlation is determined and 
stored at each scan position. Accordingly, column correlations 288(1 >288(5) are made with respect 
5 to the scanned positions of column 278 shown in respective Figs. 10B-10F. 

Function block 290 indicates that for each of columns 280 in pixel block 276 a stored 
column determination is retrieved for each scan position previously computed and stored in function 
block 268. For example, column correlation 270(1) of Fig. 9C is the same as column correlation 
270'(1) of Fig. 10C. Similarly, column correlations 270'(2), 270X3), 270 , (5)-270 , (8), and 270'(15)- 
10 270'(1 8) of Figs. 10D-1 OF are the same as the corresponding column correlations in Figs. 9D, 9E, 
and 9G. For pixel block 276, therefore, only one column correlation 288 is calculated for each scan 
position. As a result, the number of calculations required for pixel block 276 is reduced by nearly 80 
percent. 

Function block 292 indicates that a subsequent pixel block 294 (Fig. 1 1 A) is defined with 

1 5 respect to a central pixel R in the next successive row relative to pixel M. Pixel block 294 includes 
columns 296 of pixels that are similar to but distinct from columns 264 of pixels in pixel block 262 
of Fig. 9A. In particular, columns 296 include pixels A'-E* not included in columns 264. Such an 
incremental definition of subsequent pixel block 294 is substantially the same as that used in 
conventional block matching processes. 

20 Function block 298 indicates that pixel block 294 is scanned across pixel array 266 (Fig. 

9B) in the manner described above with reference to function blocks 268 and 276. Figs. 1 1B-1 IF 
represent the scanning of pixel block 294 across pixel array 266. 

Function block 300 indicates that a column correlation is determined and stored for each of 
columns 296. Accordingly, column correlations 302(1)-302(18) are made with respect to the 

25 scanned positions of columns 296 shown in Figs. 1 IB- 1 IF. 

Each of column correlations 302(1>302(18) may be calculated in an abbreviated manner 
with reference to column correlations made with respect to pixel block 262 (Fig. 9A). 

For example, column correlations 302(4)-302(8) of Fig. 1 ID include subcolumn 
correlations 304 , (4)-304 , (8) that are the same as subcolumn correlations 304(4)-304(8) of Fig. 9E. 

30 Accordingly, column correlations 302(4)-302(8) may be determined from respective column 

correlations 270(4>270(8) by subtracting from the latter correlation values for pixels 01 A, 02B, 03C, 
04D, and 05E to form subcolumn correlations 304(4)-304(8), respectively. Column correlations 
302(4)-302(8) may be obtained by adding correlation values for the pixel pairs 56A\ 57B', 58C, 
59D f and 50E' to the respective subcolumn correlation values 304(4)-304(8), respectively. 

35 The determination of column correlations 302(4)-302(8) from respective column 

correlations 270(4)-270(8) entails subtracting individual pixel correlation values corresponding to the 
row of pixels A-E of pixel block 262 not included in pixel block 294, and adding pixel correlation 
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values for the row of pixels A'-F included in pixel block 294 but not pixel block 262. This method 
substitutes for each of column correlations 302(4)-302(8), one substraction and one addition for the 
five additions that would be required to determine each column correlation in a conventional manner. 
With pixel blocks of larger dimensions as are preferred, the improvement of this method over 
conventional calculation methods is even greater. Conventional block matching processes 
identify only total block correlations for each scan position. of initial pixel block 262 relative to pixel 
array 266. As a consequence, all correlation values for all pixels must be calculated separately for 
each scan position. In contrast, correlation process 260 utilizes stored column correlations 270 to 
significantly reduce the number of calculations required. The improvements in speed and processor 
resource requirements provided by correlation process 260 more than offset the system requirements 
for storing the column correlations. 

It will be appreciated that correlation process 260 has been described with reference to Figs. 
9-11 to illustrate specific features of this invention. As shown in the illustrations, this invention 
includes recurring or cyclic features that are particularly suited to execution by a computer system. 
These recurring or cyclic features are dependent upon the dimensions of pixel blocks and pixel arrays 
and are well understood and can be implemented by persons skilled in the art. 
Multi-Dimensional Transformation 

Fig. 12 is a functional block diagram of a transformation method 350 that includes 
generating a multi-dimensional transformation between objects in first and second successive image 
frames and quantitizing the mapping for transmission or storage. The multi-dimensional 
transformation preferably is utilized in connection with function block 96 of Fig. 3. Transformation 
method 350 is described with reference to Fig. 7A and Fig. 13, the latter of which like Fig. 7B is a 
simplified representation of display screen 50 showing image frame 202b in which image feature 204 
is rendered as object 204b. 

Transformation method 350 preferably provides a multi-dimensional affine transformation 
capable of representing complex motion that includes any or all of translation, rotation, 
magnification, and shear. Transformation method 350 provides a significant improvement over 
conventional video compression methods such a MPEG-1, MPEG-2, and H.26X, which are of only 
one dimension and represent only translation. In this regard, the dimensionality of a transformation 
refers to the number of coordinates in the generalized form of the transformation, as described below 
in greater detail. Increasing the accuracy with which complex motion is represented according to this 
invention results in fewer errors than by conventional representations, thereby increasing 
compression efficiency. 

Function block 352 indicates that a dense motion estimation of the pixels in objects 204a 
and 204b is determined. Preferably, the dense motion estimation is obtained by polygon match 
process 200. As described above, the dense motion estimation includes motion vectors between 
pixels at coordinates ( Xi> y) in object 204b of image frame 202b and corresponding pixels at 
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locations (X/, y/) of object 204a in image frame 202a. 

Function block 354 indicates that an array of transformation blocks 356 is defined to 
encompass object 204b. Preferably, transformation blocks 356 are right regular arrays of pixels 
having dimensions of, for example, 32x32 pixels. 
5 Function block 358 indicates that a multi-dimensional affine transformation is generated for 

each transformation block 356. Preferably, the affine transformations are of first order and 
represented as: 

xj'-axj+byi+c 

y^dXi+eyrff, 

1 0 and are determined with reference to all pixels for which the motion vectors have a relatively high 
confidence. These affine transformations are of two dimensions in that xi and y; are defined relative 
to two coordinates: X; and y r 

The relative confidence of the motion vectors refers to the accuracy with which the motion 
vector between corresponding pikels can be determined uniquely relative to other pixels. For 

1 5 example, motion vectors between particular pixels that are in relatively large pixel arrays and are 
uniformly colored (e.g., black) cannot typically be determined accurately. In particular, for a black 
pixel in a first image frame, many pixels in the pixel array of the subsequent image frame will have 
the same correlation (i.e., mean absolute value error between pixel blocks). 

In contrast, pixel arrays in which pixels correspond to distinguishing features typically will 

20 have relatively high correlations for particular corresponding pixels in successive image frames. 

The relatively high correlations are preferably represented as a minimal absolute value error 
determination for particular pixel. Motion vectors of relatively high confidence may, therefore, be 
determined relative to such uniquely low error values. For example, a high confidence motion vector 
may be defined as one in which the minimum absolute value error for the motion vector is less than 

25 the next greater error value associated with the pixel by a difference amount that is greater than a 
threshold difference amount. Alternatively, high confidence motion vectors may be defined with 
respect to the second order derivative of the absolute error values upon which the correlations are 
determined. A second order derivative of more than a particular value would indicate a relatively 
high correlation between specific corresponding pixels. 

30 With n-number of pixels with such high-confidence motion vectors, the preferred affine 

transformation equations are solved with reference to n-number of corresponding pixels in image 
frames 202a and 202b. Images frames must include at least three corresponding pixels in image 
frames 202a and 202b with high confidence motion vectors to solve for the six unknown coefficients 
a, b, c, d, e, and f of the preferred affine transformation equations. With the preferred dimensions, 

35 each of transformation blocks 356 includes 2 10 pixels of which significant numbers typically have 
relatively high confidence motion vectors. Accordingly, the affine transformation equations are 
over-determined in that a significantly greater number of pixels are available to solve for the 
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coefficients a, b, c, d, e, and f. 

The resulting n-number of equations may be represented by the linear algebraic expression: 



Yn J 
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Preferably these equations are solved by a conventional singular value decomposition (SVD) method, 
which provides a minimal least-square error for the approximation of the dense motion vectors. A 
conventional SVD method is described, for example, in Numerical Recipes in C . by Press et al., 
Cambridge University Press, (1992). 

As described above, the preferred two-dimensional affine transformation equations are 
capable of representing translation, rotation, magnification, and shear of transformation blocks 356 
between successive image frames 202a and 202b. In contrast, conventional motion transformation 
methods used in prior compression standards employ simplified transformation equations of the 
form: 

Xi-Xj+g 

y^i+h 

The prior simplified transformation equations represent motion by only two coefficients, g 
and h, which represents only one-third the amount of information (i.e., coefficients) obtained by the 
preferred multi-dimensional transformation equations. To obtain superior compression of the 
information obtained by transformation method 350 relative to conventional compression methods, 
the dimensions of transformation block 356 preferably are more than three times larger than the 
corresponding 16x16 pixel blocks employed in MPEG-1 and MPEG-2 compression methods. The 
preferred 32x32 pixel dimensions of transformation blocks 356 encompass four times the number of 
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pixels employed in the transformation blocks of conventional transformation methods. The larger 
dimensions of transformation blocks 356, together with the improved accuracy with which the affine 
transformation coefficients represent motion of the transformation blocks 356, allow transformation 
method 350 to provide greater compression than conventional compression methods. 
5 It will be appreciated that the affine coefficients generated according to the present 

invention typically would be non-integer, floating point values that could be difficult to compress 
adequately without adversely affecting their accuracy. Accordingly, it is preferable to quantize the 
affine transformation coefficient to reduce the bandwidth required to store or transmit them. 

Function block 362 indicates that the affine transformation coefficients generated with 
1 0 reference to function block 358 are quantized to reduce the bandwidth required to store or transmit 
them. Fig. 14 is an enlarged fragmentary representation of a transformation block 356 showing three 
selected pixels, 364a, 364b, and 364c from which the six preferred affine transformation coefficients 
a-f may be determined. 

Pixels 364a-364c are represented as pixel coordinates (x b y,), (x 2 , y 2 ), and (x 3) y 3 ), 
1 5 respectively. Based upon the dense motion estimation of function block 352, pixels 364a-364c have 
respective corresponding pixels (x/, y/), (y 2 \ y 2 '), (x 3 \ y 3 ') in preceding image frame 202a. As is 
conventional, pixel locations (Xj, y^ are represented by integer values and are solutions to the affine 
transformation equations upon which the preferred affine transformation coefficients are based. 
Accordingly, selected pixels 364a-364c are used to calculate the corresponding pixels from the 
20 preceding image frame 202a, which typically will be floating point values. 

Quantization of these floating point values is performed by converting to integer format the 
difference between corresponding pixels (x r x'i, y-y^). The affine transformation coefficients are 
determined by first calculating the pixel values (x'i, y'j) from the difference vectors and the pixel 
values (x;, yi), and then solving the multi-dimensional transformation equations of function block 358 
25 with respect to the pixel values (xV y'j). 

As shown in Fig. 14, pixels 364a-364c preferably are distributed about transformation block 
356 to minimize the sensitivity of the quantization to local variations within transformation block 
356. Preferably, pixel 364a is positioned at or adjacent the center of transformation block 356, and 
pixels 364b and 364c are positioned at upper corners. Also in the preferred embodiment, the selected 
30 pixels for each of the transformation blocks 356 in object 204b have the same positions, thereby 
allowing the quantization process to be performed efficiently. 

Another aspect of the quantization method of function block 362 is that different levels of 
quantization may be used to represent varying degrees of motion. As a result, relatively simple 
motion (e.g., translation) may be represented by fewer selected pixels 364 than are required to 
35 represent complex motion. With respect to the affine transformation equations described above, 
pixel 364a (x„ y,) from object 204b and the corresponding pixel (x,\ y/) from object 204a are 
sufficient to solve simplified affine transformation equations of the form: 
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x,'=y,+c 

y.-y.+f, - • 

which represent translation between successive image frames. Pixel 364a specifically is used 
because its central position generally represents translation^ motion independent of the other types 
5 of motion. Accordingly, a user may selectively represent simplified motion such as translation with 
simplified affine transformation equations that require one-third the date required to represent 
complex motion. 

Similarly, a pair of selected pixels (x„ y,) (e.g., pixel 364a) and (x 2 , y 2 ) (i.e., either of pixels 
364b and 364c) from object 204b and the corresponding pixels (x, 1 , y/) and (x 2 ', y 2 ') from object 
1 0 204a are sufficient to solve simplified affine transformation equations of the form: 
Xi ,= aXi+c 
yi'=eyi+f, 

which are capable of representing motions that include translation and magnification between 
successive image frames. In the simplified form: 
1 5 x'=acosqx+sinqy+c 
y'=-sinqx+acosqy+f 

the corresponding pairs of selected pixels are capable of representing motions that include 
translation, rotation, and isotropic magnification. In this simplified form, the common coefficients of 
the x and y variables allow the equations to be solved by two corresponding pairs of pixels. 

Accordingly, a user may selectively represent moderately complex motion that includes 
translation, rotation, and magnification with partly simplified affine transformation equations. Such 
equations would require two-thirds the data required to represent complex motion. Adding the third 
selected pixel (x 3 , y 3 ) from object 204b, the corresponding pixel (x 3 ', y 3 ') from object 204a, and the 
complete preferred affine transformation equations allows a user also to represent shear between 
25 successive image frames. 

A preferred embodiment of transformation method 350 (Fig. 12) is described as using 
uniform transformation blocks 356 having dimensions of, for example, 32x32 pixels. The preferred 
multi-dimensional affine transformations described with reference to function block 358 are 
determined with reference to transformation blocks 356. It will be appreciated that the dimensions of 

30 transformation blocks 356 directly affect the compression ratio provided by this method. 

Fewer transformation blocks 356 of relatively large dimensions are required to represent 
transformations of an object between image frames than the number of transformation blocks 356 
having smaller dimensions. A consequence of uniformly large transformation blocks 356 is that 
correspondingly greater error can be introduced for each transformation block. Accordingly, 

35 uniformly sized transformation blocks 356 typically have moderate dimensions to balance these 
conflicting performance constraints. 
Transformation Block Optimization 
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Fig. 15 is a functional block diagram of a transformation block optimization method 370 
that automatically selects transformation block dimensions that provide a minimal error threshold. 
Optimization method 370 is described with reference to Fig. 16, which is a simplified representation 
of display screen 50 showing a portion of image frame 202b with object 204b. 

Function block 372 indicates that an initial transformation block 374 is defined with respect 
to object 204b. Initial transformation block 374 preferably is of maximal dimensions that are 
selectable by a user and are, for example, 64x64 pixels. Initial transformation block 374 is 
designated the current transformation block. 

Function block 376 indicates that a current signal-to-noise ratio (CSNR) is calculated with 
respect to the current transformation block. The signal-to-noise ratio preferably is calculated as the 
ratio of the variance of the color component values of the pixel within the current transformation 
block (i.e., the signal) to the variance of the color components values of the pixels associated with 
estimated error 98 (Fig. 3). 

Function block 378 indicates that the current transformation block (e.g., transformation 
block 374) is subdivided into, for example, four equal sub-blocks 380a-380d, affine transformations 
are determined for each of sub-blocks 380a-380d, and a future signal-to-noise ratio is determined 
with respect to the affine transformations. The future signal-to-noise ratio is calculated in 
substantially the same manner as the current sighal-to-noise ratio described with reference to function 
block 376. 

Inquiry block 382 represents an inquiry as to whether the future signal-to-noise ratio is 
greater than the current signal-to-noise ratio by more than a user-selected threshold amount. This 
inquiry represents a determination that further subdivision of the current transformation block (e.g., 
transformation block 374) would improve the accuracy of the affine transformations by at least the 
threshold amount. Whenever the future signal-to-noise ratio is greater than the current signal-to- 
noise ratio by more than the threshold amount, inquiry block 382 proceeds to function block 384, and 
otherwise proceeds to function block 388. 

Function block 384 indicates that sub-blocks 380a-380d are successively designated the 
current transformation block, and each are analyzed whether to be further subdivided. For purposes 
of illustration, sub-block 380a is designated the current transformation and processed according to 
function block 376 and further sub-divided into sub-blocks 386a-386d. Function block 388 indicates 
that a next successive transformation block 374* is identified and designated an initial or current 
transformation block. 
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Precompression Extrapolation Method 

Figs. 17A ancLB are a functional block diagram of a precompression extrapolation method 
400 for extrapolating image features of arbitrary configuration to a predefined configuration to 
facilitate compression in accordance with function block 1 12 of encoder process 64 (both of Fig. 3). 
Extrapolation method 400 allows the compression of function block 1 12 to be performed in a 
conventional manner such as DCT or lattice wavelet compression, as described above. 

Conventional still image compression methods such a lattice wavelet compression or 
discrete cosine transforms (DCT) operate upon rectangular arrays of pixels. As described above, 
however, the methods of the present invention are applicable to image features or objects of arbitrary 
configuration. Extrapolating such objects or image features to a rectangular pixel array configuration 
allows use of conventional still image compression methods such as lattice wavelet compression or 
DCT. Extrapolation method 400 is described below with reference to Figs. 18A-18D, which are 
representations of display screen 50 on which a simple object 402 is rendered to show various aspects 
of extrapolation method 400. 

Function block 404 indicates that an extrapolation block boundary 406 is defined about 
object 402. Extrapolation block boundary 406 preferably is rectangular. Referring to Fig. 18A,the 
formation of extrapolation block boundary 406 about object 402 is based upon an identification of a 
perimeter 408 of object 402 by, for example, object segmentation method 140 (Fig. 4). Extrapolation 
block boundary 406 is shown encompassing object 402 in its entirety for purposes of illustration. It 
20 will be appreciated that extrapolation block boundary 406 could alternatively encompass only a 
portion of object 402. As described with reference to object segmentation method 140, pixels 
included in object 402 have color component values that differ from those of pixels not included in 
object 402. 

Function block 410 indicates that all pixels 412 bounded by extrapolation block boundary 
406 and not included in object 402 are assigned a predefined value such as, for example, a zero value 
for each of the color components. 

Function block 414 indicates that horizontal lines of pixels within extrapolation block 
boundary 406 are scanned to identify horizontal lines with horizontal pixel segments having both 
zero and non-zero color component values. 
30 Function block 4 1 6 represents an inquiry as to whether the horizontal pixel segments having 

color component values of zero are bounded at both ends by perimeter 408 of object 402. Referring 
to Fig. 18B, region 4 1 8 represents horizontal pixel segments having color component values of zero 
that are bounded at both ends by perimeter 408. Regions 420 represent horizontal pixel segments 
that have color component values of zero and are bounded at only one end by perimeter 408. 
35 Function block 416 proceeds to function block 426 for regions 418 in which the pixel segments have 
color component values of zero bounded at both ends by perimeter 408 of object 402, and otherwise 
proceeds to function block 422. 
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Function block 422 indicates that the pixels in each horizontal pixel segment of a region 420 
is assigned the color component values of a pixel 424 (only exemplary ones shown) in the 
corresponding horizontal lines and perimeter 408 of object 402. Alternatively, the color component 
values assigned to the pixels in regions 420 are functionally related to the color component values of 
5 pixels 424. 

Function block 426 indicates that the pixels in each horizontal pixel segment in region 418 
are assigned color component values corresponding to, and preferably equal to, an average of the 
color component values of pixels 428a and 428b that are in the corresponding horizontal lines and on 
perimeter 408. 

1 0 Function block 430 indicates that vertical lines of pixels within extrapolation block 

boundary 406 are scanned to identify vertical lines with vertical pixel segments having both zero and 
non-zero color component values. 

Function block 432 represents an inquiry as to whether the vertical pixel segments in 
vertical lines having color component values of zero are bounded at both ends by perimeter 408 of 

1 5 object 402. Referring to Fig. 1 8C, region 434 represents vertical pixel segments having color 

component values of zero that are bounded at both ends by perimeter 408. Regions 436 represent 
vertical pixel segments that have color component values of zero and are bounded at only one end by 
perimeter 408. Function block 432 proceeds to function block 444 for region 434 in which the 
vertical pixel segments have color component values of zero bounded at both ends by perimeter 408 

20 of object 402, and otherwise proceeds to function block 438. 

Function block 438 indicates that the pixels in each vertical pixel segment of region 436 are 
assigned the color component values of pixels 442 (only exemplary ones shown) in the vertical lines 
and perimeter 408 of object 402. Alternatively, the color component values assigned to the pixels in 
region 436 are functionally related to the color component values of pixels 442. 

25 Function block 444 indicates that the pixels in each vertical pixel segment in region 434 are 

assigned color component values corresponding to, and preferably equal to, an average of the color 
component values of pixels 446a and 446b that are in the horizontal lines and on perimeter 408. 

Function block 448 indicates that pixels that are in both horizontal and vertical pixel 
segments that are assigned color component values according to this method are assigned composite 

30 color component values that relate to, and preferably are the average of, the color component values 
otherwise assigned to the pixels according to their horizontal and vertical pixel segments. 

Examples of pixels assigned such composite color component values are those pixels in 
regions 418 and 434. 

Function block 450 indicates that regions 452 of pixels bounded by extrapolation block 
35 boundary 406 and not intersecting perimeter 408 of object 402 along a horizontal or vertical line are 
assigned composite color component values that are related to, and preferably equal to the average 
of, the color component values assigned to adjacent pixels. Referring to Fig. I8D, each of pixels 454 
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in regions 452 is assigned a color component value that preferably is the average of the color 
component values of pixels 456a and 456b that are aligned with pixel 454 along respective horizontal 
and vertical lines and have non-zero color component values previously assigned by this method. 

A benefit of object extrapolation process 400 is that is assigns smoothly varying color 
5 component values to pixels not included in object 402 and therefore optimizes the compression 

capabilities and accuracy of conventional still image compression methods. In contrast, prior art zero 
padding or mirror image methods, as described by Chang et al., "Transform Coding of Arbitrarily- 
Shaped Image Segments," ACM Multimedia, pp. 83-88, June, 1993, apply compression to 
extrapolated objects that are filled with pixels having zero color components values such as those 
10 applied in function block 410. The drastic image change than occurs between an object and the zero- 
padded regions introduces high frequency changes that are difficult to compress or introduce image 
artifacts upon compression. Object extrapolation method 400 overcomes such disadvantages. 
Alternative Encoder Method 

Fig. 19A is a functional block diagram of an encoder method 500 that employs a Laplacian 
1 5 pyramid encoder with unique filters that maintain nonlinear aspects of image features, such as edges, 
while also providing high compression. Conventional Laplacian pyramid encoders are described, for 
example, in the Laplacian Pyramid as a Compact Image Code by Burt and Addleson, IEEE Trans. 
Comm., Vol. 31, No. 4, pp. 532-540, April 1983. Encoder method 500 is capable of providing the 
encoding described with reference to function block 1 12 of video compression encoder process 64 
20 shown in Fig. 3, as well as whenever else DCT on wavelet encoding is suggested or used. By way of 
example, encoder method 500 is described with reference to encoding of estimated error 1 10 (Fig. 3). 

A first decimation filter 502 receives pixel information corresponding to an estimated error 
110 (Fig. 3) and filters the pixels according to a filter criterion. In a conventional Laplacian pyramid 
method, the decimation filter is a low-pass filter such as a Gaussian weighting function. In 
25 accordance with encoder method 500, however, decimation filter 502 preferably employs a median 
filter and, more specifically, a 3x3 nonseparable median filter. 

To illustrate, Fig. 20A is a simplified representation of the color component values for one 
color component (e.g., red) for an arbitrary set or array of pixels 504. Although described with 
particular reference to red color component values, this illustration is similarly applied to the green 
30 and blue color component values of pixels 504. 

With reference to the preferred embodiment of decimation filter 502, filter blocks 506 
having dimensions of 3x3 pixels are defined among pixels 504. For each pixel block 506, the 
median pixel intensity value is identified or selected. With reference to pixel blocks 506a-506c, for 
example, decimation filter 502 provides the respective values of 8, 9, and 10, which are listed as the 
35 first three pixels 5 12 in Fig. 20B. 

It will be appreciated, however, that decimation filter 502 could employ other median filters 
according to this invention. Accordingly, for each group of pixels having associated color 
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component values of {ao, a„ . . a,,.,} the median filter would select a median value a M . 

A first 2x2 dawn sampling filter 514 samples alternate pixels 512 in vertical and horizontal 
directions to provide additional compression. Fig. 20C represents a resulting compressed set of 
pixels 515. 

5 A 2x2 up sample filter 516 inserts a pixel of zero value in place of each pixel 512 omitted 

by down sampling filter 5 14, and interpolation filter 5 1 8 assigns to the zero-value pixel a pixel value 
of an average of the opposed adjacent pixels, or a previous assigned value if the zero-value pixel is 
not between an opposed pair of non-zero value pixels. To illustrate, Fig. 20D represents a resulting 
set or array of value pixels 520. 
10 A difference 522 is taken between the color component values of the set of pixels 504 and 

the corresponding color component values for set of pixels 520 to form a zero-order image 
component I 0 . 

A second decimation filter 526 receives color component values corresponding to the 
compressed set of pixels 515 generated by first 2x2 down sampling filter 514. Decimation filter 526 
1 5 preferably is the same as decimation filter 502 (e.g. ; a 3x3 nonseparable median filter). Accordingly, 
decimation filter 526 functions in the same manner as decimation filter 502 and delivers a resulting 
compressed set or array of pixels (not shown) to a second 2x2 down sampling filter 528. 

Down sampling filter 528 functions in the same manner as down sampling filter 514 and 
forms a second order image component L 2 that also is delivered to a 2x2 up sample filter 530 and an 
20 interpolation filter 53 1 that function in the same manner as up sample filter 5 1 6 and interpolation 
filter 518, respectively. A difference 532 is taken between the color component values of the set of 
pixels 515 and the resulting color component values provided by interpolation filter 531 to form a 
first-order image component I,. 

The image components I 0 , I„ and L 2 are respective sets of color component values that 
25 represent the color component values for an nxn array of pixels 504. 



n n n n 
nxn, — Jt— , — x— 
2 2 4 4 

Image component I 0 maintains the high frequency components (e.g., edges) of an image 
represented by the original set of pixel 504. Image components I, and represent low frequency 
aspects of the original image. Image components I 0 , 1, and L 2 provide relative compression of the 
original image. Image component I 0 and I, maintain high frequency features (e.g., edges) in a format 
that is highly compressible due to the relatively high correlation between the values of adjacent 
pixels. Image component L 2 is not readily compressible because it includes primarily low frequency 
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image features, but is a set of relatively small size. 

Fig. 19B is a functional block diagram of a decoder method 536 that decodes or inverse 
encodes image components I 0 , I„ and L 2 generated by encoder method 500. Decoder method 536 
includes a first 2x2 up sample filter 538 that receives image component L 2 and interposes a pixel of 
5 zero value between each adjacent pair of pixels. An interpolation filter 539 assigns to the zero-value 
pixel a pixel value that preferably is an average of the values of the adjacent pixels, or a previous 
assigned value if the zero-value pixel is not between an opposed pair of non-zero-value pixels. First 
2x2 up sample filter 538 operates in substantially the same manner as up sample filters 5 16 and 530 
of Fig. 19A, and interpolation filter 539 operates in substantially the same manner as interpolation 
10 filters 518 and 531. 

A sum 540 is determined between image component I, and the color component values 
corresponding to the decompressed set of pixels generated by first 2x2 up sample filter 538 and 
interpolation filter 539. A second 2x2 up sample filter 542 interposes a pixel of zero value between 
each adjacent pair of pixels generated by sum 540. An interpolation filter 543 assigns to the zero- 
value pixel a pixel value that includes an average of the values of the adjacent pixels, or a previous 
assigned value if the zero-value pixel is not between an opposed pair of non-zero-value pixels. Up 
sample filter 542 and interpolation filter 543 are substantially the same as up sample filter 538 and 
interpolation filter 539, respectively. 

A sum 544 sums the image component I 0 with the color component values corresponding to 
the decompressed set of pixels generated by second 2x2 up sample filter 542 and interpolation filter 
543. Sum 544 provides decompressed estimated error 1 10 corresponding to the estimated error 1 10 
delivered to encoder process 500. 
Transform coding of motion vectors 

Conventional video compression encoder processes, such as MPEG-1 or MPEG-2, utilize 
25 only sparse motion vector fields to represent the motion of significantly larger pixel arrays of a 

regular size and configuration. The motion vector fields are sparse in that only one motion vector is 
used to represent the motion of a pixel array having dimensions of, for example, 16x16 pixels. The 
sparse motion vector fields, together with transform encoding of underlying images or pixels by, for 
example, discrete cosine transform (DCT) encoding, provide conventional video compression 
30 encoding. 

In contrast, video compression encoding process 64 (Fig. 3) utilizes dense motion vector 
fields in which motion vectors are determined for all, or virtually all, pixels of an object. Such dense 
motion vector fields significantly improve the accuracy with which motion between corresponding 
pixels is represented. Although the increased accuracy can significantly reduce the errors associated 
with conventional sparse motion vector field representations, the additional information included in 
dense motion vector fields represent an increase in the amount of information representing a video 
sequence. In accordance with this invention, therefore, dense motion vector fields are themselves 
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compressed or encoded to improve the compression ratio provided by this invention. 

Fig. 2 1 is a functional block diagram of a motion vector encoding process 560 for encoding 
or compressing motion vector fields and, preferably, dense motion vector fields such as those 
generated in accordance with dense motion transformation 96 of Fig. 3. It will be appreciated that 
5 such dense motion vector fields from a selected object typically will have greater continuity or 
"smoothness" than the underlying pixels corresponding to the object. As a result, compression or 
encoding of the dense motion vector fields will attain a greater compression ratio than would 
compression or encoding of the underlying pixels. 

Function block 562 indicates that a dense motion vector field is obtained for an object or a 
1 0 portion of an object in accordance with, for example, the processes of function block 96 described 
with reference to Fig. 3. Accordingly, the dense motion vector field will correspond to an object or 
other image portion of arbitrary configuration or size. 

Function block 564 indicates that the configuration of the dense motion vector field is 
extrapolated to a regular, preferably rectangular, configuration to facilitate encoding or compression. 
1 5 Preferably, the dense motion vector field configuration is extrapolated to a regular configuration by 
precompression extrapolation method 400 described with reference to Figs. 17A and 17B. It will be 
appreciated that conventional extrapolation methods, such as a mirror image method, could 
alternatively be utilized. 

Function block 566 indicates that the dense motion vector field with its extrapolated regular 
20 configuration is encoded or compressed according to conventional encoding transformations such as, 
for example, discrete cosine transformation (DCT) or lattice wavelet compression, the former of 
which is preferred. 

Function block 568 indicates that the encoded dense motion vector field is further 
compressed or encoded by a conventional lossless still image compression method such as entropy 
25 encoding to form an encoded dense motion vector field 570. Such a still image compression method 
is described with reference to function block 1 14 of Fig. 3. 



Compression of Quantized Objects From Previous Video Frames 

Referring to Fig. 3, video compression encoder process 64 uses quantized prior object 98 

30 determined with reference to a prior frame N-l to encode a corresponding object in a next successive 
frame N. As a consequence, encoder process 64 requires that quantized prior object 98 be stored in 
an accessible memory buffer. With conventional video display resolutions, such a memory buffer 
would require a capacity of at least one megabyte to store the quantized prior object 98 for a single 
video frame. Higher resolution display formats would require correspondingly larger memory 

35 buffers. 

Fig. 22 is a functional block diagram of a quantized object encoder-decoder (codec) process 
600 that compresses and selectively decompresses quantized prior objects 98 to reduce the required 
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capacity of a quantized object memory buffer. 

Function block 602 indicates that each quantized object 98 in an image frame is encoded on 
a block-by-block manner by a lossy encoding or compression method such as discrete cosine 
transform (DCT) encoding or lattice sub-band (wavelet) compression. 

Function block 604 indicates that the encoded or compressed quantized objects are stored in 
a memory buffer (not shown). 

Function block 606 indicates that encoded quantized objects are retrieved from the memory 
buffer in anticipation of processing a corresponding object in a next successive video frame. 

Function block 608 indicates that the encoded quantized object is inverse encoded by, for 
example, DCT or wavelet decoding according to the encoding processes employed with respect to 
function block 602. 

Codec process 600 allows the capacity of the corresponding memory buffer to be reduced 
by up to about 80%. Moreover, it will be appreciated that codec process 600 would be similarly 
applicable to the decoder process corresponding to video compression encoder process 64. 
1 5 Video Compression Decoder Process Overview 

Video compression encoder process 64 of Fig. 3 provides encoded or compressed 
representations of video signals corresponding to video sequences of multiple image frames. The 
compressed representations include object masks 66, feature points 68, affine transform coefficients 
104, and compressed error data 1 16 from encoder process 64 and compressed master objects 136 
from encoder process 130. These compressed representations facilitate storage or transmission of 
video information, and are capable of achieving compression ratios of up to 300 percent greater than 
those achievable by conventional video compression methods such as MPEG-2. 

It will be appreciated, however, that retrieving such compressed video information from 
data storage or receiving transmission of the video information requires that it be decoded or 
decompressed to reconstruct the original video signal so that it can be rendered by a display device 
such as video display device 52 (Figs. 2A and 2B). As with conventional encoding processes such as 
MPEG-1, MPEG-2, and H.26X, the decompression or decoding of the video information is 
substantially the inverse of the process by which the original video signal is encoded or compressed. 

Fig. 23A is a functional block diagram of a video compression decoder process 700 for 
decompressing video information generated by video compression encoder process 64 of Fig. 3. For 
purposes of consistency with the description of encoder process 64, decoder process 700 is described 
with reference to Figs. 2A and 2B. Decoder process 700 retrieves from memory or receives as a 
transmission encoded video information that includes object masks 66, feature points 68, compressed 
master objects 136, affine transform coefficients 104, and compressed error data 1 16. 

Decoder process 700 performs operations that are the inverse of those of encoder process 64 
(Fig. 3). Accordingly, each of the above-described preferred operations of encoder process 64 
having a decoding counterpart would similarly be inversed. 
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Function block 702 indicates that masks 66, feature points 68, transform coefficients 104, 
and compressed error data 1 16 are retrieved from memory or received as a transmission for 
processing by decoder process 700. 

Fig. 23B is a functional block diagram of a master object decoder process 704 for decoding 
5 or decompressing compressed master object 1 36. Function block 706 indicates that compressed 
master object data 136 are entropy decoded by the inverse of the conventional lossless entropy 
encoding method in function block 134 of Fig. 3B. Function block 708 indicates that the entropy 
decoded master object from function block 706 is decoded according to an inverse of the 
conventional lossy wavelet encoding process used in function block 132 of Fig. 3B. 
1 0 Function block 7 1 2 indicates that dense motion transformations, preferably multi- 

dimensional affine transformations, are generated from affine coefficients 104. Preferably, affine 
coefficients 104 are quantized in accordance with transformation method 350 (Fig. 12), and the 
affine transformations are generated from the quantized affine coefficients by performing the inverse 
of the operations described with reference to function block 362 (Fig. 12). 
1 5 Function block 7 1 4 indicates that a quantized form of an object 7 1 6 in a prior frame N- 1 

(e.g., rectangular solid object 56a in image frame 54a) provided via a timing delay 718 is 
transformed by the dense motion transformation to provide a predicted form of the object 720 in a 
current frame N (e.g., rectangular solid object 56b in image frame 54b). 

Function block 722 indicates that for image frame N, predicted current object 720 is added 
20 to a quantized error 724 generated from compressed error data 116. In particular, function block 726 
indicates that compressed error data 1 16 is decoded by an inverse process to that of compression 
process 1 14 (Fig. 3 A). In the preferred embodiment, function blocks 1 14 and 726 are based upon a 
conventional lossless still image compression method such as entropy encoding. 

Function block 728 indicates that the entropy decoded error data from function block 726 is 
25 further decompressed or decoded by a conventional lossy still image compression method 

corresponding to that utilized in function block 1 12 (Fig. 3 A). In the preferred embodiment, the 
decompression or decoding of function block 728 is by a lattice subband (wavelet) process or a 
discrete cosine transform (DCT) process. 

Function block 722 provides quantized object 730 for frame N as the sum of predicted 
30 object 720 and quantized error 724, representing a reconstructed or decompressed object 732 that is 
delivered to function block 718 for reconstruction of the object in subsequent frames. 

Function block 734 indicates that quantized object 732 is assembled with other objects of a 
current image frame N to form a decompressed video signal. 

35 Simplified Chain Encoding 

Masks, objects, sprites, and other graphical features, commonly are represented by their 
contours. As shown in and explained with reference to Fig. 5A, for example, rectangular solid object 
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56a is bounded by an object perimeter or contour 142. A conventional process or encoding or 
compressing contours is referred to as chain encoding! 

Fig. 24A shows a conventional eight-point chain code 800 from which contours on a 
conventional recta-linear pixel array are defined. Based upon a current pixel location X, a next 
5 successive pixel location in the contour extends in one of directions 802a-802h. The chain code 

value for the next successive pixel is the numeric value corresponding to the particular direction 802. 
As examples, the right, horizontal direction 802a corresponds to the chain code value O, and the 
downward, vertical direction 802g corresponds to the chain code value 6. Any continuous contour 
can be described from eight-point chain code 800. 
10 With reference to FIG. 24B, a contour 804 represented by pixels 806 designated X and A-G 

can be encoded in a conventional manner by the chain code sequence {00764432}. In particular, 
beginning from pixel X, pixels A and B are positioned in direction 0 relative to respective pixels X 
and A. Pixel C is positioned in direction 7 relative to pixel B. Remaining pixels D-G are similarly 
positioned in directions corresponding to the chain code values listed above. In a binary 
1 5 representation, each conventional chain code value is represented by three digital bits. 

Fig. 25A is a functional block diagram of a chain code process 810 capable of providing 
contour compression ratios at least about twice those of conventional chain code processes. Chain 
code process 810 achieves such improved compression ratios by limiting the number of chain codes 
and defining them relative to the alignment of adjacent pairs of pixels. Based upon experimentation, 
20 it has been discovered that the limited chain codes of chain code process 810 directly represent more' 
than 99.8% of pixel alignments of object or mask contours. Special case chain code modifications 
accommodate the remaining less than 0.2% of pixel alignment as described below in greater detail. 

Function block 816 indicates that a contour is obtained for a mask, object, or sprite. The 
contour may be obtained, for example, by object segmentation process 140 described with reference 
25 to FIGS. 4 and 5. 

Function block 818 indicates that an initial pixel in the contour is identified. The initial 
pixel may be identified by common methods such as, for example, a pixel with minimal X-axis and 
Y-axis coordinate positions. 

Function block 820 indicates that a predetermined chain code is assigned to represent the 
30 relationship between the initial pixel and the next adjacent pixel in the contour. Preferably, the 
predetermined chain code corresponds to a forward direction. 

Fig. 25B is a diagrammatic representation of a three-point chain code 822. Chain code 822 
includes three chain codes 824a, 824b, and 824c that correspond to a forward direction 826a, a 
leftward direction 826b, and a rightward direction 826c, respectfully. Directions 826a-826c are 
35 defined relative to a preceding alignment direction 828 between a current pixel 830 and an adjacent 
pixel 832 representing the preceding pixel in the chain code. 

Preceding alignment direction 828 may extend in any of the directions 802 shown in Fig. 
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24A, but is shown with a specific orientation (i.e., right, horizontal) for purposes of illustration. 
Direction 826a is defined, therefore, as the same as direction 828. Directions 826b and 826c differ 
from direction 828 by leftward and rightward displacements of one pixel. 

It has been determined experimentally that slightly more than 50% of chain codes 824 
5 correspond to forward direction 826a, and slightly less than 25% of chain codes 824 correspond to 
each of directions 826b and 826c. 

Function block 836 represents an inquiry as to whether the next adjacent pixel in the contour 
conforms to one of directions 826. Whenever the next adjacent pixel in the contour conforms to one 
of directions 826, function block 836 proceeds to function block 838, and otherwise proceeds to 
10 function block 840. 

Function block 838 indicates that the next adjacent pixel is assigned a chain code 824 
corresponding to its direction 826 relative to the direction 828 along which the adjacent preceding 
pair of pixels are aligned. 

Function block 840 indicates that a pixel sequence conforming to one of directions 826 is 
1 5 substituted for the actual nonconformal pixel sequence. Based upon experimentation, it has been 
determined that such substitutions typically will arise in fewer than 0.2% of pixel sequences in a 
contour and may be accommodated by one of six special-case modifications. 

FIG. 25C is a diagrammatic representation of the six special-case modifications 842 for 
converting non-conformal pixel sequences to pixel sequences that conform to directions 826. Within 
20 each modification 842, a pixel sequence 844 is converted to a pixel sequence 846. In each of pixel 
sequences 844 of adjacent respective pixels X 1 , X 2 , A, B, the direction between pixels A and B does 
not conform to one of directions 826 due to the alignment of pixel A relative to the alignment of 
pixels X 1 and X 2 . 

In pixel sequence 844a, initial pixel alignments 850a and 852a represent a nonconformal 
25 right-angle direction change. Accordingly, in pixel sequence 846a, pixel A of pixel sequence 844a is 
omitted, resulting in a pixel direction 854a that conforms to pixel direction 826a. Pixel sequence 
modifications 842b-842f similarly convert nonconformal pixel sequences 844b-844f to conformal 
sequences 846b-846f, respectively. 

Pixel sequence modifications 842 omit pixels that cause pixel direction alignments that 
30 change by 90° or more relative to the alignments of adjacent preceding pixels XI and X2. One effect 
is to increase the minimum radius of curvature of a contour representing a right angle to over three 
pixels. Pixel modifications 842 cause, therefore, a minor loss of extremely fine contour detail. 
According to this invention, however, it has been determined that the loss of such details is 
acceptable under most viewing conditions. 
35 Function block 860 represents an inquiry as to whether there is another pixel in the contour 

to be assigned a chain code. Whenever there is another pixel in the contour to be assigned a chain 
code, function block returns to function block 836, and otherwise proceeds to function block 862. 
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Function block 862 indicates that nonconform^ pixel alignment directions introduced or 
incurred by the process, of function block 840 are removed. In a preferred embodiment, the 
nonconforms direction changes may be omitted simply by returning to function block 816 and 
repeating process 810 until no nonconformed pixel sequences remain, which typically is achieved in 
fewer than 8 iterations. In an alternative embodiment, such incurred nonconform^ direction changes 
may be corrected in "real-time" by checking for and correcting any incurred nonconform^ direction 
changes each time a nonconformal direction change is modified. 

Function block 864 indicates that a Huffman code is generated from the resulting simplified 
chain code. With chain codes 824a-824c corresponding to directions 826A-826C that occur for 
about 50%, 25% and 25% of pixels in a contour, respective Huffman codes of 0, 1 1, and 10 are 
assigned. Such first order Huffman codes allow chain process 810 to represent contours at a bit rate 
of less than 1.5 bits per pixel in the contour. Such a bitrate represents approximately a 50% 
compression ratio improvement over conventional chain code processes. 

It will be appreciated that higher order Huffman coding could provide higher compression 
ratios. Higher order Huffman coding includes, for example, assigning predetermined values to 
preselected sequences of first order Huffman codes. 
Sprite Generation Overview 

Sprite generation is process used in connection with encoding determinate motion video 
(movie) that involves constructing a representative image that represents a non-rectangular video 
object in each frame of a video sequence. In sprite generation, bitmaps are accreted into bitmap 
series that comprise a plurality of sequential bitmaps of sequential images from an image source. 
Accretion is used to overcome the problem of occluded pixels where objects or figures move relative 
to one another or where one figure occludes another similar to the way a foreground figure occludes 
the background. For example, when a foreground figure moves and reveals some new background, 
there is no way to build that new background from a previous bitmap unless the previous bitmap was 
first enhanced by including in it the pixels that were going to be uncovered in the subsequent bitmap. 
This method takes an incomplete image of a figure and looks forward in time to find any pixels that 
belong to the image but are not to be immediately visible. Those pixels are used to create a 
composite bitmap for the figure. With the composite bitmap, any future view of the figure can be 
30 created by distorting the composite bitmap. 

Implementation of Sprite Generation Using Feature Points 

The encoding process begins by an operator identifying the figures and the parts of the 
figures of a current bitmap from a current bitmap series. Feature or distortion points are selected by 
the operator on the features of the parts about which the parts of the figures move. A current grid of 
triangles is superimposed onto the parts of the current bitmap. The triangles that constitute the 
current grid of triangles are formed by connecting adjacent distortion points. The distortion points 
are the vertices of the triangles. The current location of each triangle on the current bitmap is 
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determined and stored to the storage device. A portion of data of the current bitmap that defines the 
first image within the current location of each triangle is retained for further use. 

A succeeding bitmap that defines a second image of the current bitmap series is received 
from the image source, and the figures and the parts of the figure are identified by the operator. 
5 Next, the current grid of triangles from the current bitmap is superimposed onto the succeeding 
bitmap. The distortion points of current grid of triangles are realigned to coincide with the features 
of the corresponding figures on the succeeding bitmap. The realigned distortion points form a 
succeeding grid of triangles on the succeeding bitmap of the second image. The succeeding location 
of each triangle on the succeeding bitmap is determined and stored to the storage device. A portion 
1 0 of data of the succeeding bitmap that defines the second image within the succeeding location of each 
triangle is retained for further use. 

The process of determining and storing the current and succeeding locations of each triangle 
is repeated for the plurality of sequential bitmaps of the current bitmap series. When that process is 
completed, an average image of each triangle in the current bitmap series is determined from the 
1 5 separately retained data. The average image of each triangle is stored to the storage device. 

During playback, the average image of each triangle of the current bitmap series and the 
current location of each triangle of the current bitmap are retrieved from the storage device. A 
predicted bitmap is generated by calculating a transformation solution for transforming the average 
image of each triangle in the current bitmap series to the current location of each triangle of the 
20 current bitmap and applying the transformation solution to the average image of each triangle. The 
predicted bitmap is passed to the monitor for display. 

In connection with a playback determinate motion video (video game) in which the images 
are determined by a controlling program at playback, a sprite bitmap is stored in its entirety on a 
storage device. The sprite bitmap comprises a plurality of data bits that define a sprite image. The 
25 sprite bitmap is displayed on a monitor, and the parts of the sprite are identified by an operator and 
distortion points are selected for the sprite's parts. 

A grid of triangles is superimposed onto the parts of the sprite bitmap. The triangles that 
constitute the grid of triangles are formed by connecting adjacent distortion points. The distortion 
points are the vertices of the triangles. The location of each triangle of the sprite bitmap is 
30 determined and stored to the storage device. 

During playback, a succeeding location of each triangle is received from a controlling 
program. The sprite bitmap and the succeeding location of each triangle on the sprite bitmap are 
recalled from the storage device and passed to the display processor. The succeeding location of 
each triangle is also passed to the display processor. 
35 A transformation solution is calculated for each triangle on the sprite bitmap. A succeeding 

bitmap is then generated in the display processor by applying the transformation solution of each 
triangle derived from the sprite bitmap the defines the sprite image within the location of each 
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triangle. The display processor passes the succeeding sprite bitmap to a monitor for display. This 
process is repeated for each succeeding location of each triangle requested by the controlling 
program. 

As shown in Fig. 26, an encoding procedure for a movie motion video begins at step 900 by 
the CPU 22 receiving from an image source a current bitmap series. The current bitmap series 
comprises a plurality of sequential bitmaps of sequential images. The current bitmap series has a 
current bitmap that comprises a plurality of data bits which define a first image from the image 
source. The first image comprises at least one figure having at least one part. 

Proceeding to step 902, the first image is displayed to the operator on the monitor 28. From 
the monitor 28, the figures of the first image on the current bitmap are identified by the operator. 
The parts of the figure on the current bitmap are then identified by the operator at step 904. 

Next, at step 906, the operator selects feature or distortion points on the current bitmap. The 
distortion points are selected so that the distortion points coincide with features on the bitmap where 
relative movement of a part is likely to occur. It will be understood by those skilled in the art that the 
figures, the parts of the figures and the distortion points on a bitmap may be identified by the 
computer system 20 or by assistance from it. It is preferred, however, that the operator identify the 
figures, the parts of the figures and the distortion points on a bitmap. 

Proceeding to step 908, a current grid of triangles is superimposed onto the parts of the 
current bitmap by the computer system 20. With reference to Fig. 27A, the current grid comprises 
triangles formed by connecting adjacent distortion points. The distortion points form the vertices of 
the triangles. More specifically, the first image of the current bit map comprises a figure, which is a 
person 970. The person 970 has six parts corresponding to a head 972, a torso 974, a right arm 976, 
a left arm 978, right leg 980, and a left leg 982. Distortion points are selected on each part of the 
person 970 so that the distortion points coincide with features where relative movement of a part is 
likely to occur. A current grid is superimposed over each part with the triangles of each current grid 
formed by connecting adjacent distortion points. Thus, the distortion points form the vertices of the 
triangles. 

At step 910, the computer system 20 determines a current location of each triangle on the 
current bitmap. The current location of each triangle on the current bitmap is defined by the location 
of the distortion points that form the vertices of the triangle. At step 912, the current location of each 
triangle is stored to the storage device. A portion of data derived from the current bitmap that defines 
the first image within the current location of each triangle is retained at step 914. 

Next, at step 916, a succeeding bitmap of the current bitmap series is received by the CPU 
22. The succeeding bitmap comprises a plurality of data bits which define a second image of the 
current bitmap series. The second image may or may not include figures that correspond to the 
figures in the first image. For the following steps, the second image is assumed to have figures that 
corresponds to the figures in the first image. At step 918, the current grid of triangles is 
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superimposed onto the succeeding bitmap. The second image with the superimposed triangular grid 
is displayed to the operator on the monitor 28. 

At step 920, the distortion points are realigned to coincide with corresponding features on 
the succeeding bitmap by the operator with assistance from the computer system 20. The computer 
5 system 20 realigns the distortion using block matching. Any mistakes are corrected by the operator. 
With reference to Fig. 27B, the realigned distortion points form a succeeding grid of triangles. The 
realigned distortion points are the vertices of the triangles. More specifically, the second image of 
the succeeding bitmap of person 200 includes head 972, torso 974, right arm 976, left arm 978, right 
leg 980, and left leg 982. In the second image, however, the right arm 980 is raised. The current 
1 0 grids of the first image have been superimposed over each part and their distortion points realigned to 
coincide with corresponding features on the second image. The realigned distortion points define 
succeeding grids of triangles. The succeeding grids comprise triangles formed by connecting the 
realigned distortion points. Thus, the realigned distortion point form the vertices of the triangles of 
the succeeding grids. 

1 5 Proceeding to step 922, a succeeding location of each triangle of the succeeding bitmap is 

determined by the computer system 20, At step 924, the succeeding location of each triangle on the 
succeeding bitmap is stored the storage device. A portion of data derived from the succeeding 
bitmap that defines the second image within the succeeding location of each triangle is retained at 
step 926. Step 926 leads to decisional step 928 where it is determined if a next succeeding bitmap 

20 exists. 

If a next succeeding bitmap exists, the YES branch of decisional step 928 leads to step 930 
where the succeeding bitmap becomes the current bitmap. Step 930 returns to step 916 where a 
succeeding bitmap of the current bitmap series is received by the CPU 22. If a next succeeding 
bitmap does not exist, the NO branch of decisional step 928 leads to step 932 where an average 

25 image for each triangle of the current bitmap series is determined. The average image is the median 
value of the pixels of a triangle. Use of the average image makes the process less susceptible to 
degeneration. Proceeding to step 934, the average image of each triangle of the current bitmap series 
is stored to the storage device. 

Next, at step 936, the current location of each triangle on the current bitmap is retrieved 

30 from the storage device. An affme transformation solution for transforming the average image of 
each triangle to the current location of the triangle on the current bitmap is then calculated by the 
computer system 20 at step 938. At step 940, a predicted bitmap is generated by applying the 
transformation solution of the average image of each triangle to the current location of each triangle 
on the current bitmap. The predicted bitmap is compared with the current bitmap at step 942. 

35 A * step 944 a correction bitmap is generated. The corrected bitmap comprises the data bits 

of the current bitmap that were not accurately predicted by the predicted bitmap. The corrected 
bitmap is stored to the storage device at step 948. Step 948 leads to decisional step 950 where it is 
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determined if a succeeding bitmap exists. 

If a succeeding bitmap exists, the YES branch of decisional step 950 leads to step 952 where 
the succeeding bitmap becomes the current bitmap. Step 952 returns to step 936 where the current 
location of each triangle on the current bitmap is retrieved from the storage device. If a next 
5 succeeding bitmap does not exist, the NO branch of decisional step 950 leads to decisional step 954 
where it is determined if a succeeding bitmap series exists. If a succeeding bitmap series does not 
exist, encoding is finished and the NO branch of decisional step 954 leads to step 956. If a 
succeeding bitmap series exists, the YES branch of decisional step 954 leads to step 958 where the 
CPU 22 receives the succeeding bitmap series as the current bitmap series. Step 956 returns to step 
1 0 902 where the figures of the first image of the current bitmap series is identified by the operator. 

The process of Fig. 26 describes generation of a sprite or master object 90 for use by 
encoder process 64 of Fig. 3. The process of utilizing master object 90 to form predicted objects 102 
is described with reference to Fig. 28. 

As shown in Fig. 28, the procedure begins at step 1000 with a current bitmap series being 
15 retrieved. The current bitmap series comprises a plurality of sequential bitmaps of sequential images. 
The current bitmap series has a current bitmap that comprises a plurality of data bits which define a 
first image from the image source. The first image comprises at least one figure having at least one 
part. 

At step 1002, the average image of each triangle of the current bitmap series is retrieved 
20 from the storage device. The average image of each triangle is then passed to a display processor 
(not shown) at step 704. It will be appreciated that computer system 20 (Fig. 1 ) can optionally 
include a display processor or other dedicated components for executing for processes of this 
invention. Proceeding to step 1006, the current location of each triangle on the current bitmap is 
retrieved from the storage device. The current location of each triangle is passed to the display 
25 processor at step 1008. 

Next, an affine transformation solution for transforming the average image of each triangle 
to the current location of each triangle on the current bitmap is calculated by the display processor at 
step 1010. Proceeding to step 1012, a predicted bitmap is generated by the display processor by 
applying the transformation solution for transforming the average image of each triangle to the 
30 current location of each triangle on the current bitmap. 

At step 1014, a correction bitmap for the current bitmap is retrieved from the storage device. 
The correction bitmap is passed to the display processor at step 716. A display bitmap is then 
generated in the display processor by overlaying the predicted bitmap with the correction bitmap. 
The display processor retains a copy of the average image of each triangle and passes the display 
35 bitmap to the frame buffer for display on the monitor. 

Next, at decisional step 1020, it is determined if a succeeding bitmap of the current bitmap 
series exists. If a succeeding bitmap of the current bitmap series exists, the YES branch of decisional 
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step 1020 leads to step 1022. At step 1022, the succeeding bitmap becomes the current bitmap. Step 
1022 returns to step 1006 where the location of each triangle on the current bitmap is retrieved from 
the storage device. 

Returning to decisional step 1020. if a succeeding bitmap of the current bitmap series does 
5 not exist, the NO branch of decisional step 1020 leads to decisional step 1024. At decisional step 
1024, it is determined if a succeeding bitmap series exists. If a succeeding bitmap series does not 
exist, then the process is finished and the NO branch of decisional step 1024 leads to step 1026. If a 
succeeding bitmap series exists, the YES branch of decisional step 1024 leads to step 1028. At step 
1 028, the succeeding bitmap series becomes the current bitmap series. Step 1028 returns to step 

10 1000. 

Simplified Object Coding Using Sprites 

Fig. 29 is a functional block diagram of a simplified compression method 1 100 preferably 
for use in conjunction with video compression encoder process 64 and video compression decoder 
process 700 described with reference to Figs. 3A and 23A, respectively. Simplified object 

1 5 compression method 1 1 00 is directed to encoding and decoding compressed video representations of 
sprite-defined objects, which are defined completely throughout a video sequence from the time the 
object first appears, as described below in greater detail. 

Simplified compression method 1 100, encoder process 64, and decoder process 700, 
preferably are applied to general video sequences or information that include at least one sprite- 

20 defined object and at least one general video object that is not defined completely from its first 

appearance in the video sequence. The general video object or objects of the general video sequence 
would preferably be processed by encoder process 64 and decoder process 700 described 
hereinabove. The sprite-defined object or objects of the general video sequence would preferably be 
processed by simplified compression method 1 100. 

25 Tne sprite-defined object or objects are a subset of the general objects in the general video 

sequence and have available more information when they first appear in a video sequence than other 
objects. Simplified compression method 1 100 allows the additional information available for sprite- 
defined objects to be utilized more efficiently than the additional information would be by encoder 
process 64 and decoder process 700. As a result, processing the sprite-defined object or objects of 

30 the general video sequence in accordance with simplified process 1 100 can further improve bit rate 
requirements and efficiency for storing or transmitting the general video information. 

Sprite-defined objects are completely defined throughout the video sequence as of their first 
appearance by a "sprite" and one or more trajectories. The sprite includes all the image 
characteristics of an object throughout the video sequence, and one or more trajectories warp or 

35 transform the sprite to represent the object in each frame of the video sequence. In some 

applications, the objects are further defined or modified by lighting or transparency factors that also 
are known as of the initial appearance of the object in the video sequence. Sprite-defined objects 
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arise frequently in, for example, video conferencing and computer-generated graphics applications. 
Simplified compression method 1 100 is also applicable to, for example, general video compression, 
multimedia applications, digital video archiving, network or Internet browsing, and video 

transmission. 

With reference to Fig. 29, process block 1 102 indicates that object information relating to 
each of multiple objects in a general video sequence is obtained according to, for example, function 
blocks 66-96 of video compression encoder process 64 (Fig. 3A). At least one of the objects in the 
video sequence is a sprite-generated object for which all the object information required for the video 
sequence is available or "known" at the beginning of the video sequence. The object information for 
the sprite-defined object or objects includes a sprite that includes all the image characteristics of the 
object throughout the video sequence, and a trajectory that warps or transforms the sprite to represent 
the object in each frame of the video sequence. The video sequence is general in that it includes or 
can include at least one general video object that is not a sprite-generated object. 

With regard to computer-generated graphics, for example, objects are commonly 
represented by sprites (or mathematically-defined object models) and trajectories that are available at 
the start of a video sequence or when the objects first appear in the vudeo sequence. The trajectory, 
which is sometimes referred to as transformation information, is utilized to generate the object 
throughout the video sequence and is therefore available. The transformation information may be in 
the form of multi-dimensional affine transformations or perspective transformations. As described 
20 hereinabove with reference to transformation method 350 shown in Fig. 12, affine transformations 
preferably are represented as: 
x - ax + by + c 
x - dx + ey + f. 

Alternatively, the transformation information may be in the form of perspective transformations that 
25 preferably are represented as: 

x' = (ax + by + c)\(gx + hy + 1), 
y' = (dx + ey + f)\(gx + hy+ 1), 
as described, for example, in Digital Image Warpinp by George Wolberg, IEEE, 1990. 

As described with reference to the sprite generating or encoding process of Fig. 26 (e.g., 
step 936), many sprites or objects would commonly be represented more than one affine transform 
and therefore more than one trajectory. It will be appreciated that references in this description of 
simplified compression method 1 100 to a transform (whether affine or perspective) or trajectory in 
the singular are merely illustrative and are not intended to imply a limitation of this method. 

With regard to a video conferencing application, for example, an object corresponding to the 
background of the video conference scene can be formed as a sprite-defined object that is obtained 
by imaging the background alone during a background sprite imaging period. The transformation 
information in this.example would be an identity transform that maintains the object in a static state 
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throughout the video scene. In such an application, the simplification of the representation of the 
background object could decrease significantly the bandwidth required to transmit a video sequence 
or conference. 

It will be appreciated that these two examples are merely illustrative and are not limitations 
5 on the types or applications of sprites. As an example, a computer-generated graphic background 
that is transformed by simple translation could be utilized either with other computer-generated 
graphics objects or video graphic objects. 

Process block 1 104 indicates that sprite-defined objects are distinguished from the general 
objects in the object information relating to each of multiple objects in the general video sequence. 
10 The sprite-defined objects are processed in accordance with the following process steps of simplified 
compression process 1 100. The general objects preferably are processed in accordance with encoder 
process 64 and decoder process 700 of Figs 3 A and 23 A, respectively. The sprite-defined objects 
may be distinguished, for example, by being labeled as such when they are created. 

With regard to a video conferencing application, for example, an object corresponding to the 
1 5 background of the video conference scene could be a sprite-defined object that is obtained by 

imaging the background alone during a background imaging period. As another example, computer- 
generated graphic objects typically are generated entirely by data or algorithms and would typically 
be sprite-defined objects. In general live-action video sequences, an encoding operator could 
manually label some objects as being sprite-defined. 
20 Process block 1 106 indicates that the sprite information and the transformation information 

are encoded by a simplified encoding process. Preferably, the sprite information is recorded by a 
conventional lossy encoding process such as wavelet or DCT encoding, as described hereinabove. 
Whenever sprite-defined objects are further defined or modified by additional factors, such as 
lighting or transparency factors (known as of the initial appearance of the object in the video 
25 sequence), the additional factors would also be encoded in the same manner as the sprite. The 
encoding of the sprite and additional factors may further include, for example, application of 
precompression extrapolation method 400 (Figs. 17A and 17B) to provide enhanced compression or 
encoding. 

The trajectories or transformation information are represented as trajectories and preferably 
30 encoded or compressed by a lossless encoding technique such as QM-coding, as described in JPEG: 
Still Image Data Compression Standard , William B. Pennebaker and Joan L. Mitchell, Van Nostrand 
Reinhold, New York, 1993. It will be appreciated that other lossless encoding techniques such as 
Huffman encoding would be suitable. For purposes of illustration, the following description is 
directed to QM-encoding. 

35 The locations of features points in the sprite are set as the basis of each trajectory. The x- 

and y-components of each trajectory are differentially coded and concatenated before undergoing 
QM-coding to form a representation or bitstream having the coordinates of the feature points or 
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pixels in the sprite and the coordinate differences between the feature points or pixels in the sprite 
and the corresponding feature points or pixels of the corresponding object in subsequent image 
frames. For a video sequence having T-number of image frames and a sprite having N-number of 
trajectories for feature point locations represented as y.®) 9 i=l, ...,T, j=l,...,N, the values that are 
5 preferably QM-coded are: 

x«\ Dx/<>, ...,Dx T <'> tyi <>>, Dy 2 <'> Dy T <», 

x,< 2 >, Dx 2 < 2 >, Dx T < 2 >, y« Dy 2 m ... t "py T « 

x™, Dx™, Dx T < N >, y/ N >, Dy™, Dy T ^. 

Process block 1 108 indicates that the encoded sprite-generated objects are stored or 
1 0 transmitted and retrieved or received according to typical uses of encoded or compressed digital 
information, including multimedia applications, digital video archiving network or Internet 
browsing, and video transmission. 

Process block 1110 indicates that the encoded sprite-generated objects retrieved from 
storage or received via transmission are decoded. The encoded trajectory is QM-decoded, and the 
15 sprite is decoded according to the format in which it was encoded (e.g., DCT). 
Overview of Sprites in Object Based Video Coding 

A sprite is an image composed of pixels representing a video object throughout a video 
sequence. For example, a sprite representing the background of scene throughout a video sequence 
will contain all of the visible pixels of the background object throughout the entire sequence. 
20 Portions of the background object occluded for part of the sequence and visible during the remainder 
of the sequence are still used to generate the sprite for the entire sequence. Since the sprite contains 
all parts of the background object that were at least visible once during the video sequence, the sprite 
can be used to directly reconstruct the background objects, or it can be used for predictive coding of 
the background objects. 

25 There are two principle types of sprites: (1) off-line static sprites, and (2) on-line dynamic 

sprites. Static sprites are sprites that are directly copied (including appropriate warping and 
cropping) to reconstruct video object from the sprite for a frame in a video sequence. A static sprite 
is built off-line, and the texture and shape of the static sprite is coded and transmitted separate from 
coding the video itself. In the off-line case, a static sprite is generated using a video object from 

30 each frame in a video sequence. 

A dynamic sprite differs from a static sprite in that it is used as a reference in predictive 
coding. In predictive coding, the motion of a video object is compensated using motion parameters 
for the sprite. A dynamic sprite is dynamically built on-line during coding in both the encoder and 
the decoder. 

35 Off-line static sprites are particularly suitable for synthetic objects. They are also suitable 

for natural video objects that mostly undergo rigid motion. On-line dynamic sprites are used in an 
enhanced predictive coding environment. In the case of natural video objects, on-line dynamic 
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sprites are preferred to reduce latency or to preserve details caused by local motion. 

An important aspect of sprite-based coding is the method for generating sprites. The 
methods for generating on-line and off-line sprites use a similar approach. In the off-line case, the 
sprite generation method uses the video objects from each frame to build the sprite before the video 
5 sequence is encoded. In the on-line case, the encoder and the decoder generate a sprite from the 
current video object and a video object reconstructed for the previous frame (the previously 
constructed video object or "reference object"). 

The process of sprite generation includes the following steps: 

1) Perform global motion estimation to determine how each pixel in a video object maps to 
1 0 a corresponding pixel in the sprite; 

2) Use the global motion estimation parameters to warp the video object to the sprite's 
coordinate system; and 

3) Blend the pixels of the warped object with the corresponding pixels of the sprite. 

Fig. 30 illustrates an example of video objects in a video sequence to illustrate the process of 

1 5 generating a sprite. This example shows four frames (0-3), each depicting a video object I 0 - I 3 

representing a can of juice. The arrows in frame 0 indicate the direction of motion of the juice can 
throughout the video sequence. Specifically, the juice can is tumbling end-over-end and rotating 
slightly about its axis shown as the dashed line. As the juice can tumbles forward, it is also rotating 
slightly such that the left side of the "JUICE" label is becoming occluded and the right side of the 

20 label is becoming visible. The images illustrated at the bottom of Fig. 30 are the sprites as they are 
incrementally constructed from video objects of each frame. Each image S 0 - S 3 represents the sprite 
as it is constructed from the warped video object of each frame. 

Fig. 3 1 is a block diagram illustrating a sprite generator. The motion estimation block 1200 
performs "global motion estimation" based on the video object of the current frame, the sprite 

25 constructed from the video object in previous frames, and the masks of the current video object and 
sprite. In the case of an off-line static sprite, the global motion estimation block estimates the relative 
motion between the video object of the current frame and the sprite constructed from the video 
objects of previous frames. The global motion estimation block performs the same function for 
dynamic sprites except that it estimates motion between a video object of the current frame and a 

3 0 previously constructed object. 

The global motion of video object is modeled on the basis of a geometric motion model 
such as a perspective transform. Our current implementation of the sprite generator is based on the 
perspective transform, but it also performs other types of transforms that are special cases of the 
perspective transform, including no motion (stationary), translational motion, isotropic 

35 magnification, rotation and translation, and an affine transformation. These transformations define 
the motion of the video object relative to a sprite. Data representing each transform (the 
transformation information) can be represented as a set of motion parameters. For example, if the 
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global motion model is based on a perspective transform, the motion parameters can be represented 
as a set of motion coefficients. These motion coefficients correspond to the elements in the 
perspective transform matrix used to warp a video object (i.e., points in the video object) to the 
coordinate space of the sprite. As described above, the motion parameters can also be represented as 
a series of trajectories or motion vectors that define how reference points in a video object move 
relative to the coordinate space of the sprite. 

In our implementation, the motion estimation block 1200 computes motion parameters used 
to warp each video object into the previously constructed sprite. Initially, the sprite is set to the video 
object of the first frame (frame 0) as shown in Fig. 30. Then, the motion estimation block 1200 
attempts to find the set of motion parameters that minimizes the intensity error between each pixel in 
a warped video object and its corresponding pixel in the sprite. 

The warping block 1202 shown in Fig. 31 represents the process of transforming 
coordinates of pixels in a video object to coordinates in a sprite or reference object. It uses the 
motion parameters to transform the video object for the current frame (the current video object) into 
sprite coordinates. For example in Fig. 30, the motion parameters for the video object I, are used to 
warp I, into sprite S 0 . If a perspective transform is used as the motion model, the warping block 1202 
uses the perspective transform matrix formed from the motion coefficients to warp the video object 
into the coordinate space of the sprite. 

The blending block 1204 represents the process of combining a warped object with a sprite 
or reference object. In the case of the on-line dynamic sprite, the blending block 1204 blends the 
current object with the previously constructed object. One advantage of our implementation is that 
the blending block 1204 incrementally blends each warped video object with the sprite such that each 
of the video objects provides a substantially equal contribution to the sprite. To accomplish this, the 
blending block weights the pixels from the sprite in proportion to the number of video objects from 
previous frames from which the sprite has been constructed. This ensures that the sprite receives 
substantially the same contribution from each video object throughout a video sequence. The 
specific approach for blending sprites with warped video objects is explained in more detail below. 

Fig. 30 illustrates an example of how the sprite is constructed by blending a warped object 
with the previously constructed sprite for each frame. For example, the blending block builds sprite 
S, by blending sprite S 0 with the warped video object W,. Note that the sprite is updated for each 
frame by blending the previously constructed sprite with the warped video object for the current 



frame. 



As shown in Fig. 31, the sprite is stored in a frame memory 1206. The motion estimation 
block 1200 accesses the sprite from the frame memory to compute the motion parameters for each 
video object in the video sequence. Specifically, the motion estimation uses the masks and texture 
data for the current video object and sprite to compute the motion coefficients of a 2D transform that 
estimates the relative motion between the current video object and the sprite. 
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Having provided an overview of the sprite generation process, it is helpful to consider how 
sprites and motion parameters are encoded to understand how sprites are used in object based video 
coding. 

Sprite Coding 

Sprites are encoded by encoding their texture, shape, and motion parameters. The texture 
and shape of a sprite can be coded using the techniques described above. 

One way to encode the motion parameters of each video object is by recording the trajectory 
of reference points of the video object. For example, the motion coefficients of a perspective 
transform can be encoding using the coordinates of four or more reference points of the video object 
and the corresponding "warped" reference points in the sprite. Using these four pairs of reference 
points, the video decoder can derive the motion coefficients of the transform used in motion 
estimation. 

The number of reference points needed to encode the motion coefficients of a transform 
depends on the type of transform. The following description summarizes the reference points needed 
to encode a perspective transform and the special cases of the perspective transform (affine, isotropic 
magnification, rotation, and translation, translational motion, stationary (no motion)). 

The perspective transformation used in global motion estimation can be expressed as: 
x' = (ax+by+c)/(gx+hy+l), 
/ =(dx+ey+j)/(gx+hy+ 1), 

where {a,b,c,d,e,f,g,h} are the motion coefficients of the transformation, (xy) is the coordinate of a 
pixel in the current video object and (*>') is the warped coordinate. In the case of an on-line 
dynamic sprite, the coordinate (x',y f ) is the coordinate of a pixel in a previous video object, expressed 
in the coordinate system of the video object. In the case of an off-line static sprite, the coordinate 
(x'y) is the coordinate in the sprite expressed in the sprite coordinate system. We refer to the 
warped coordinate as the coordinate of the "corresponding pixel" in the sprite because this coordinate 
identifies the location of the pixel in the sprite that corresponds to the pixel in the video object. 

It takes at least 4 reference points to encode the motion coefficients of the perspective 
transform. Transforms that model less complex motion can be treated as special cases of the 
perspective transform. These transforms include isotropic magnification, rotation and translation, the 
affine transform, translation, and stationary. 

1 . Affine transformation 



35 



Three points are needed (encoded) to represent (at the encoder) an affine transform, and 
solve (at the decoder) for coefficients {a,b t c,d,ej} . 
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x'=ax+by+c, 
y-dx+ey+f. 

2. Isotropic magnification, rotation, and translation 

This type of motion can be encoded using two reference points. In this case, 
g = h = 0. The transformation equation can be expressed as: 

x' = acosq x + a sinq y + c, 
y' = -asingq x + a cosq y + f. 

It can be simplified to: 

x' = ax + by '+ c, 
y' = -bx + ay + f, 

where a = a cosq and b - a singq. Since there are four unknowns, given two pairs of points (two at 
the sprite and two at a video object), the decoder can solve the transformation coefficients in these 
two equations and perform the warping. 

3. Translation 

Only one point is needed, and thus encoded, if the motion of a video object is translation^. 
In this case, we have a = 1 = e =l,b = g = h = d = 0 and the transform equations can be expressed 



x' = X + c, 
y'-y + f. 

At the decoder, one pair of points (one at the sprite and one at the current video object) are 
enough to solve for c and f. 

4. Stationary 



35 No reference point is needed for a stationary object. In this case: 

a = e= landb = c = d = f=g = h = 0, i.e., x' = x and y' = y. 
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The location of the reference points are expressed in the coordinate systems the video 
objects and the sprites. The video objects and sprites are comprised of an array of pixels in their 
respective coordinate spaces. The pixel intensity values include luminance and chrominance values. 

One convenient way to represent the location of pixels in the video objects and sprites is to 
use two-dimensional index values that define the location of a pixel with respect to an origin. For 
example, the origin can be defined as the top left luminance pixel. The location of a pixel can then 
be expressed as a function of the two dimensional index values. One example expression of the 
location of pixel in a video object in terms of two dimensional index values, i and j, is as follows. 
Luminance pixel in the video object: 

x=i, 

y=j. 

Chrominance pixel in the video object: 

15 x=2i c +0.5, 

y=2j c +0.5, 
where index values i j, i c , j c , are integers. 

The accuracy of the motion vectors for each pixel in a video object can be defined using a 
warping accuracy parameter, s. Assuming the 1+s pixel accuracy (possible values for s are 2, 4, 8, or 
20 16) is adopted for the motion vectors, the relation between the index values and the coordinates in a 
sprite or a reference video object can be defined as follows. 

The luminance pixel in a sprite or a reference video object can be expressed as: 

x'=i'-s, 

25 y '=j'. s , 

The chrominance pixel in a sprite or a reference video object can be expressed as: 

x'=2i c '+s+0.5 

30 y'=2j c '+s+0.5 

where index values I', j', i c ', and j c ' are integers. While the warped coordinates (x',y') can be 
expressed to sub-pixel accuracy, sprites typically only have pixels on an integer pixel grid. Thus, bi- 
linear interpolation is used to compute the corresponding pixel values (x',y') for pixels (x,y) in the 
35 video object. 

The reference points of a video object can be chosen as the corners of a bounding box of a 
video object. The bounding box of a video object is a rectangular region that encloses the pixels in 
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the video object. Using this format, the four comers of the bounding box of the video object can be 
represented by the top left corner of the bounding box, along with the width and height of the 
bounding box. The motion coefficients used in the global motion model can then be derived from 
the four reference coordinates in the sprite and the location and size of the bounding box of the video 
object. 

In summary, the motion parameters are computed using global motion estimation. The 
motion model is based on a 2D transform that warps the coordinates of pixels in a video object to 
corresponding coordinates in a sprite. The motion parameters can be encoded as a set of trajectories 
using reference points in the video object and sprite or as a set of motion coefficients. In the former 
case, the motion coefficients can be derived from the reference points. A transform function is used 
to warp a pixel in a video object to sprite space based on the motion coefficients. If the location of 
pixels in the video objects and sprites are expressed in terms of two dimensional index values as 
described above, the transform function can be expressed in terms of two dimensional index values 
in the video object and sprite. 

Implementation of the Sprite Generation Method Using Masks and Rounding Average 

This implementation of the sprite generation method includes the following three 
components: 

1) Find the motion parameters between a video object in the current frame and the 
previously constructed sprite, using the motion parameters computed for the previous video 

20 ob j ect as a starting point for the current object; 

2) Warp the video object for the current frame into the previously constructed sprite using 
the motion parameters of step 1 ; and 

3) Blend the warped object with the previously constructed sprite using rounding average. 



15 



25 



Consider the case where a video sequence consists of n frames, where each frame / includes 
an image, I rt / = 0, 1, representing a video object. The implementation of the sprite 
generation method can be summarized as follows: 



S„ - 1 0 , M 0 = I (identical transform) 

30 For (/= 1; /<«;/++) { 
M/ = M/.| 

Find perspective motion parameters M/ between 1/ and S/.,. 
Warp 1/ toward S/., using M/ to get the warped image W/; 
Blend W/ with S;., to obtain S/ using S; = (/ S/., + W/) / (/ + 1 ); 

35 } 



Fig. 32 is a flow diagram illustrating the steps in this sprite generation method. The method 
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begins by initializing the values of the sprite and the motion coefficients as shown in step 1 300. The 
sprite is initially set equal to the video object of frame 0. The motion coefficients are initially set to 
the identical transform. 

After initializing the sprite and motion coefficients, the method incrementally builds the 
5 sprite from the video object in each frame of the video sequence. To incrementally construct the 
sprite, the method repeats a series of steps for the video object I ; in each frame. Fig. 32 represents 
the iterative nature of the method using a FOR loop 1302. The first principal step for each frame is 
to perform global motion estimation. In this implementation, the process of motion estimation begins 
by initializing the motion coefficients using the motion coefficients of the previous frame as shown 

1 0 in step 1 304, and then computes the motion coefficients for the current video object as shown in step 
1 306. Before computing the motion coefficients Mi for the current video object I;, the motion 
coefficients are initially set to the motion coefficients M M computed for the previous video object I M . 
The method then computes the motion coefficients M; that estimate the motion between I } and S M . 
Using the previous motion of an object as a starting point improves motion estimation 

1 5 because the motion for the current frame is likely to be similar to the motion for the previous frame. 
In estimating the motion of the video object relative to the sprite, the method is attempting to find the 
set of pixels in the sprite that constitutes the best match for the pixels in the current video object. 
Using the previous motion as a starting point makes this search more efficient in terms of 
computational time because it reduces the number of iterations to find the best match or at least a 

20 match with acceptable error. 

After computing the motion coefficients for the current frame, the method warps the current 
video object l { to the coordinate space of S M using the motion coefficients M ; as shown in step 1308. 
The result of this step 1308 is a warped video object W ; in sprite coordinate space. The method then 
blends the warped image W ; with S; based on the following expression: S =(i S M + W;) / (/ + 1). 

25 This is illustrated in Fig. 32 as step 1310. Note that the sprite S ; is weighted in proportion to the 

number of frames / from which it has been constructed. Thus, each warped image provides an equal 
contribution to the final sprite. 

During the blending step, the method scans each pixel in the video object and warps it to 
sprite coordinates using the 2D transform based on the current motion coefficients. If the warped 

30 pixel falls outside the mask of the sprite, it is added to the sprite. Otherwise, if it falls inside the 
mask, it is blended with the sprite using the rounding average method: S s = (i S M + Wj) / (/ + I). 

The method repeats steps 1 304- 1 3 1 0 for each frame in the video sequence. Thus, after the 
blend operation, the method increments the frame and loops back to step 1304 to compute the motion 
parameters for the next frame, 

35 Motion Estimation 

One of the principal steps in the sprite generation method is to compute the global motion 
parameters for the current video object using a motion model. In our implementation, the global 
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motion model is based on a 2D planar perspective transform. This perspective transform maps one 
image, the current video object, into another image, the previously constructed sprite or video object. 

Assume that the two images under consideration are I(x,y,w)and T(x',y',w'). Using 
homogeneous coordinates, a 2D planar perspective transformation can be expressed as: 
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where (x,y, w) is a homogenous coordinate of a pixel in the first image I , (x',y',w') is a 
homogenous coordinate of the corresponding pixel in the second image I' , and 
m 0 w, m 2 

is a matrix of motion parameters representing the perspective transform. 



M = 



i .». 2 
m y m A m s 



m 6 m n 



The motion parameters are sometimes referred to as warping coefficients or motion coefficients. 

The expression for the 2D perspective transform can also be written as: 
T ._ m o x , +m, , _ m3x, ,+ m A y t +m, 

m * x i + *hy,+mi' ' rniXi+m^i+mt 
which expresses a warped coordinate as a function of the coordinates in the first image I. 

The objective in computing the motion parameters is to find a set of motion parameters that 
minimizes the error between pixel values in the first image I with corresponding pixel values in the 
second image I' . For each pixel in the first image I located at coordinates (x,y) , its corresponding 
pixel in the second image I' is located at (x',y') . 

The objective of the motion estimation component in this implementation is to minimize the 
sum of squared intensity errors over all corresponding pairs of pixels / inside the first and second 
images. In the case of an off-line static sprite, the first image I is the video object for the current 
frame and the second image I' is the sprite constructed from the video objects of previous frames ( 
the previously constructed sprite). In the case of an on-line dynamic sprite, the first image I is the 
video object for the current frame and the second image I' is the video object constructed for the 
previous frame (the previously constructed video object). 

Both the sprite and video object have masks that define their shape. The shape defines the 
boundary (also referred to as the contour) of the non-rectangular object such as a video object or a 
sprite. The error minimization problem focuses only on cases where a pixel in the first image falls 
within the mask ofthe first image and its corresponding pixel in the second image falls within the 
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mask of the second image. To accomplish this, the minimization routine sets a weighting parameter 
w. to one if (x 9 y) is-within the mask of I and (x\y') is within the mask of V . 

Thus, using this weighting parameter, the sum of squares error between pixels in the video 
object and corresponding pixels in the sprite can be expressed as: 

5 £ = I»v,[r(x;j;)-i(x„^)] ! = X w ,V 

'* i 

where w i is the weighting parameter, I(x,- ,y t ) is the ith pixel in the current video object and 

y'i) is tne corresponding pixel in the previously constructed sprite (or video object in the case 
of a dynamic sprite). 

A number of standard minimization algorithms can be used to solve this minimization 
10 problem. In one implementation, we use the Levenberg-Marquardt iterative nonlinear minimization 
algorithm to perform the minimization. This algorithm computes the partial derivatives of e\ with 
respect to the unknown motion parameters {m 0 , /w ? }. For example, 

an, D i x' dm n Z). ' ck* Vi dyx } 

w here Di is the denominator. From these partial derivatives, the algorithm computes an approximate 
1 5 Hessian matrix A and the weighted gradient vector b with components 



and then updates the motion parameter estimate m by an amount Am = A'b 
20 Fi S- 33 is a flow diagram illustrating an implementation of the global motion estimation 

method. The global estimation method has two principal components: 1) for each pixel / in the 
video object, compute the contribution of each pixel in the video object to the Hessian matrix A and 
gradient vector b (steps 1330-1340); 2) solve for Am based on the current value of the Hessian 
matrix and gradient vector and then update the motion coefficients (steps 1342-1348). These two 
25 pans can be repeated a number of times to refine the motion coefficients and minimize the intensity 
errors between the pixels of the video object and the corresponding pixels in the sprite. 

For each pixel / inside the masks at location (*/, >>/) (step 1 330), the method computes the 
corresponding position (*/, y!) in the sprite by warping the location (jc/, yi ) using the current values 
of the motion coefficients. As described above, the method uses the motion coefficients from the 
30 previous frame as a starting point. The global motion estimation method refines the motion 
coefficients with each iteration. 

After warping the position of a pixel using the current motion coefficients, the method 
determines whether the corresponding position (x' i9 yj) is within the mask of the sprite (1334). If 
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not, it skips to the next pixel (1334). 

If the corresponding position (xj.yfi is within the mask of the sprite, then the method 
computes the error e/ between the pixel value at (*/, yi ) and the pixel value at (x^yj) (1336). Next, 
it computes the partial derivative of e t with respect to each of the motion coefficients m k (k = 
0,1,2...8) (1338). Finally, the method adds the pixel's contribution to A and b based on the partial 
derivatives and error values computed in steps 1336 and 1338. 

Once the method has computed each pixel's contribution to A and b, it then proceeds to 
solve the system of equations AAm = b and update the motion parameters m 1+1 = m 1+ Am as shown in 
steps 1342 and 1344. The method then repeats the entire process for the next iteration (1346). To 
refine the motion coefficients, the method repeats the above process for a fixed number of steps /, 
e.g. / = 8. 

Warping a Video Object Using the Motion Parameters 

Once the motion parameters are obtained, the sprite generation method warps the current 
video object into the previously constructed sprite. Fig. 30 provides an example of how the video 
object Ii from each frame is warped into the previously constructed sprite S H . 
Blending a Video Object With a Previously Constructed Sprite 

With each new frame in the video sequence, the sprite is updated by blending the current, 
warped video object with the sprite constructed from previous frames. One conventional way to ' 
blend these two images is to average the current, warped video object with the sprite. Averaging, 
however, tends to put too much weight on the current video object. As a consequence, the noise in 
each individual image can influence the final quality of the sprite. In order to solve this problem, our 
implementation uses a rounding average method to blend the current, warped video object with the 
sprite. 

Assume that the video sequence include n frames, and each frame has a warped image 
representing the current, warped image for the frame, W/, / = 0, 1, „-l. The sprite S k is generated 

using W/, / = 0 *- 1 . The final sprite is S„. Preferably, the final sprite should have an equal 

contribution from the warped video object of each frame such that: 
S„ = (W 0 +... + W 11 . 1 )/n 

Our implementation incrementally blends the registered image towards the previous sprite to 
obtain the current sprite while putting equal weight to each individual image. 

A specific expression of how the implementation incrementally blends each warped video 
object with the sprite is as follows: 
S i =(/S„+W i )/(/+l). 

Our approach to sprite generation has a number of advantages. First, it uses the mask of the 
current video object and sprite to ensure that the motion coefficients are computed based only on 
pixels in the video object that warp to pixels inside the sprite. Another advantage is that it uses the 
motion coefficients from the previous frame as a starting point for estimation motion of the video 
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object in the current frame. This makes the motion estimation process more efficient and tends to 
provide more accurate motion parameters. Yet another advantage is the manner in which the method 
blends video objects with the sprite or reference object using rounding average. More specifically, 
the method incrementally updates the sprite (or reference object) by weighting the sprite in 
5 proportion to the number of video objects blended into it to ensure that each video object provides 
substantially the same contribution to the final sprite. This form of blending reduces the impact of 
noise from the video object of each frame in the sprite. 

While we have described our sprite generation method in the context of specific video 
coding techniques, we do not intend to limit our invention to the coding methods described above. 

10 As noted throughout the description above, the sprite generation method can be used as a stand alone 
module to generate off-line static sprites, as well as part of an encoder and decoder to generate on- 
line, dynamic sprites. The sprite generation method can be implemented as a set of software 
instructions and executed in a general purpose computer, implemented in special purpose logic 
hardware such as in a digital video encoder/decoder, or implemented as a combination of software 

1 5 and hardware components. 

In view of the many possible embodiments to which the principles of our invention may be 
applied, it should be recognized that the implementations described above are only examples of the 
invention and should not be taken as a limitation on the scope of the invention. Rather, the scope of 
the invention is defined by the following claims. We therefore claim as our invention all that comes 

20 within the scope and spirit of these claims. 
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1 • A methodised in object-based video coding for generating sprites from video objects in 
a video sequence, the method comprising: 

computing motion parameters that estimate motion of a video object in a current frame 
relative to a previously constructed sprite; 

warping the video object into the previously constructed sprite using the motion parameters; 

and 

incrementally blending the warped video object with the previously constructed sprite by 
weighting the previously constructed sprite in proportion to the number of times that the sprite has 
been updated with a warped video object of a previous frame such that each warped video object in 
the video sequence provides substantially the same contribution to a final sprite representing the 
video object throughout a video sequence. 

2. The method of claim 1 wherein the motion parameters are a set of motion coefficients 
and the step of computing motion parameters includes attempting to find a set of motion coefficients 
that minimizes error between pixel values in the video object for a current frame with pixel values at 
corresponding pixel positions in the previously constructed sprite, where the corresponding pixel 
positions are determined by warping pixel coordinates of pixels in the video object to the coordinate 
space of the sprite. 

3. The method of claim 2 wherein the step of computing motion parameters includes 
minimizing the sum of squared intensity errors between pixel values in the video object for a current 
frame with corresponding pixel values in the previously constructed sprite. 

4. The method of claim 2 wherein the video object and the previously constructed sprite 
each have a mask to specify which pixel positions are located within the video object and sprite, 
respectively, 

and wherein the minimizing step includes using the masks to ensure that that the error is 
computed only for cases where a pixel in the video object has a corresponding pixel inside the mask 
30 of the previously constructed sprite. 

5. The method of claim 1 wherein the step of computing the motion parameters includes 
using the motion parameters computed for a previous frame as a starting point for computing the 
motion parameters for a current frame. 



25 



35 



6. The method of claim 1 wherein the blending step comprises blending the warped video 
object for the current frame with a previously constructed sprite using the following expression: 



PCT/US98/13009 

57- 

S i =(/S i _ 1 + W i )/(/+l), 
where S; is the sprite for the current frame, / is a weighting factor proportional to the number 
of video objects from which the previously constructed sprite S M has been constructed, and Wi is the 
warped object for the current frame; and 

repeating the blending step to combine warped objects from each subsequent frame with a 
sprite constructed from each previous frame. 

7. A computer-readable medium having computer executable instructions for performing 
the steps of claim 1. 

8. A method used in object-based video coding for generating sprites from video objects in a 
video sequence, the method comprising: 

computing motion parameters that estimate motion of a video object in a current frame 
relative to a previously constructed sprite by finding a set of motion parameters that minimizes 
intensity errors between pixels within a mask of the video object and corresponding pixels within a 
mask of the sprite; 

warping the video object into the previously constructed sprite using the motion parameters; 

and 

incrementally blending the warped video object with the previously constructed sprite. 

9. The method of claim 8 wherein the blending step comprises: 

weighting the previously constructed sprite in proportion to the number of times that the 
sprite has been updated with a warped video object of a previous frame such that each warped video 
object in the video sequence provides substantially the same contribution to a final sprite representing 
the video object throughout a video sequence. 

10. The method of claim 8 wherein the step of computing the motion parameters includes 
using the motion parameters of a previous frame as a starting point to finding the set of motion 
parameters that minimizes the intensity errors. 

30 

1 1. A computer-readable medium having computer executable instructions for performing 
the steps of claim 8. 

12. A method used in object-based video coding for generating sprites from video objects 
35 in a video sequence, the method comprising: 

computing motion coefficients of a 2D transform that estimate motion of a video object in a 
current frame relative to a previously constructed sprite by finding a set of motion coefficients that 



WO 98/59497 



15 



20 



WO 98/59497 



-58- 



15 



minimizes intensity errors between pixels within a mask of the video object and corresponding pixels 
within a mask of the sprite, where location of the corresponding pixels in the sprite are computed by 
warping the pixels in the video object with the set of motion coefficients; 

warping the video object into the previously constructed sprite using the motion parameters; 



and 



incrementally blending the warped video object with the previously constructed sprite by 
weighting the previously constructed sprite in proportion to the number of times that the sprite has 
been updated with a warped video object of a previous frame such that each warped video object h 
the video sequence provides substantially the same contribution to a final sprite representing the 
1 0 video object throughout a video sequence. 

13. The method of claim 12 wherein the 2D transform is a perspective transform. 



14. The method of claim 12 wherein the blending step comprises blending the warped 
video object for the current frame with a previously constructed sprite using the following 
expression: 



Si - (i S,., + W,)/ (i + l), 
where S ( is the sprite for the current frame, / is a weighting factor proportional to the number 
of video objects from which the previously constructed sprite S„ has been constructed, and W, is the 
20 warped object for the current frame; and 

repeating the blending step to combine warped objects from each subsequent frame with a 
sprite constructed from each previous frame. 

15. A computer-readable medium having computer executable instructions for performing 
25 the steps of claim 12. 
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16. A sprite generator comprising: 

means for computing motion parameters that map a video object in a current frame into a 
previously constructed sprite; 

means for warping the video object into the previously constructed sprite using the motion 
parameters; and 

means for incrementally blending the warped video object with the previously constructed 
sprite by weighting the previously constructed sprite in proportion to the number of times that the 
sprite has been updated with a warped video object of a previous frame such that each warped video 
object in the video sequence provides substantially the same contribution to a final sprite representing 
the video object throughout a video sequence. 
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17. The sprite generator of claim 16 further including frame memory for storing a sprite 
constructed from the warped video object and the previously constructed sprite: wherein the frame 
memory is in communication with the means for computing motion parameters and the means for 
incrementally blending. 

1 8. The sprite generator of claim 16 wherein the motion parameters are a set of motion 
coefficients and the means for computing the motion parameters includes means for attempting to 
find a set of motion coefficients that minimizes error between pixel values in the video object for a 
current frame with corresponding pixel values in the previously constructed sprite. 

19. The sprite generator of claim 18 wherein means for attempting to find a set of motion 
coefficients includes means for minimizing the sum of squared intensity errors between pixel values 
in the video object for a current frame with corresponding pixel values in the previously constructed 
sprite. 

20. The sprite generator of claim 18 wherein the video object and the previously 
constructed sprite each have a mask to specify pixels located within the video object and sprite, 
respectively, 

and wherein the means for minimizing includes means for checking whether a pixel location 
is inside a mask to ensure that that the error is computed only for cases where a pixel in the video 
object has a corresponding pixel inside the mask of the previously constructed sprite. 

2 1 . The sprite generator of claim 1 8 wherein the means for computing the motion 
parameters includes means for using the motion parameters computed for a previous frame as a 
starting point for computing the motion parameters for a current frame. 



WO 98/59497 



1/34 



PCT/US98/13009 



E 




WO 98/59497 



2/34 



PCT/US98/13009 





WO 98/59497 



4/34 



PCT/US98/13009 



Fig.' 4 



140 




/ 



EXPAND OUTLINE 
OF OBJECT 



64" 




158 



DEFINE AND MATCH 
PIXEL BLOCKS ABOUT 
FEATURE POINTS 



CLASSIFY PIXELS 
BETWEEN OUTLINES 



MASK-* 
OUTLINE 



DELAY 
N-+N-1 



170 



DETERMINE SPARSE 
MOTION TRANSFORMATION 




72 



178 



TRANSFORM 
MASK 



174 

.A. 



DELAY 
N-+N-1 



WO 98/59497 „™.„,™„ 

PCT/US98/13009 

5/34 




WO 98/59497 



6/34 



Fig. 6 



PCT/US98/13009 



NO 



SEGMENT OBJECTS 



-206 



208 



DETERMINE PIXEL BLOCK 
AND SEARCH AREA 



214 



IDENTIFY INITIAL PIXEL 



222 



CENTER PIXEL BLOCK 
ABOUT CURRENT PIXEL 



224 



PIXELS 
OUTSIDE 
OBJECT. 
? 



YES 



228 



DEFINE PIXEL BLOCK TO 
OMIT PIXELS OUTSIDE OBJECT 



232 



IDENTIFY PRIOR 
CORRESPONDING PIXEL 



234 



DETERMINE MOTION VECTORS 
BETWEEN CORRESPONDING PIXELS 



236 




200 



/ 



YES 



238 



IDENTIFY NEXT 
CURRENT PIXEL 



WO 98/59497 



7/34 



PCT/US98/13009 



Fig. 7 A 




202a 



212 



Fig. 7B 




WO 98/59497 



PCT/US98/13009 



8/34 



268 

z 



SCAN INITIAL PIXEL BLOCK 
ACROSS SEARCH AREA; 
DETERMINE AND STORE 

COLUMN CORRELATIONS 



260 



274 
Z_ 



DEFINE NEXT HORIZONTAL PIXEL 
BLOCK IN HORIZONTAL DIRECTION 



284 
Z_ 



SCAN NEXT (HORIZONTAL) PIXEL BLOCK 
ACROSS SEARCH AREA 



286 

Z_ 



DETERMINE COLUMN CORRELATIONS 
FOR NEXT COLUMN 



290 



RETRIEVE PRIOR COLUMN 
CORRELATIONS 



292 

Z— 



DEFINE NEXT VERTICAL PIXEL 
BLOCK IN VERTICAL DIRECTION 



298 
Z 



SCAN NEXT (VERTICAL) PIXEL 
BLOCK ACROSS SEARCH AREA 



Fig. 8 



300 

Z— 



DETERMINE COLUMN CORRELATIONS 
FROM COLUMN CORRELATIONS FOR 
PREVIOUS PIXEL BLOCKS 
IN VERTICAL DIRECTION 



WO 98/59497 



PCT/US98/13009 



9/34 



282 

Fig. 9A < . ^NJTIAL BLOCK c d E ^262 

F G H I J 

K L M N 0 

P Q R S T 

i U V W X Y , 
v ' 

264 

Fig.9B 

OBJECT 

01 02 03 04 05 06 07 08 09 00 ' 

12 13 14 15 16 17 18 19 10 11 

23 24 25 26 27 28 29 20 21 22 

34 35 36 37 38 39 30 31 32 33 

45 46 47 48 49 40 41 42 43 44 

5? 57 58 59 50 51 52 53 54 55 

67 68 69 60 61 62 63 64 65 66 

S 79 70 71 72 73 74 75 76 77 

89 80 81 82 83 84 85 86 87 88 

90 91 92 93 94 95 96 97 98 99 



266 



Fig.9C 



INITIAL BLOCK SCANNING OBJECT /Step 1 ) 
270(1) ' H ' 



01E 

12J 

230 

34T 

45Y 

56 

67 

78 

89 

90 



02 
13 
24 
35 
46 
57 
68 
79 
80 
91 



03 


04 


05 


06 


07 


08 


09 


00 


14 


15 


16 


17 


18 


19 


10 


11 


25 


26 


27 


28 


29 


20 


21 


22 


36 


37 


38 


39 


30 


31 


32 


33 


47 


48 


49 


40 


41 


42 


43 


44 


58 


59 


50 


51 


52 


53 


54 


55 


69 


60 


61 


62 


63 


64 


65 


66 


70 


71 


72 


73 


74 


75 


76 


77 


81 


82 


83 


84 


85 


86 


87 


88 


92 


93 


94 


95 


96 


97 


98 


99 



266 



Fig.9D 



INITIAL BLOCK SCANNING OBJECT (Step 2) 
27,0(2) 270(3) 



01D 

121 

23N 

34 S 

45X 

56 

67 

78 

89 

90 



02 E 

13 J 

240 

35T 

46Y 

57 

68 

79. 

80 

91 



03 
14 
25 
36 
47 
58 
69 
70 
81 
92 



04 
15 
26 
37 
48 
59 
60 
71 
82 
93 



05 
16 
27 
38 
49 
50 
61 
72 
83 
94 



06 
17 
28 
39 
40 
51 
62 
73 
84 
95 



07 
18 
29 
30 
41 
52 
63 
74 
85 
96 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



266 



WO 98/59497 



10/34 



PCT/US98/13009 



INITIAL BLOCK SCANNING OBJECT (Step 5) 



270(4) 270 



01A 

"12F 

23K 

34P 

45U 

56 

67 

78 

89 

90 



I 02B 
13G 



24 L 

35Q 

46V 

57 " 

68 

79 

80 

91 



5) 270(6J 270(7) 270(8) 



03 C 
14 H 
25M 
36 R 
47W1 
58 
69 
70 
81 
92 



04Dll 05E 



IS 



151 

26N 

37 S 

48X 

59 " 

60 

71 

82 

93 



16 J 

270 

38T 

49Y 

50 " 

61 

72 

83 

94 



\ 06 
17 
J 28 
39 
40 
51 
62 
73 
84 
95 



304(4) 304(5) 304(6) 304(7) 304(8) 

Fig.9E 



07 
18 
29 
30 
41 
52 
63 
74 
85 
96 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



INITIAL BLOCK SCANNING OBJECT (Step 6) 

3 9) 270 10) 270(1 1) 270(12) 270(13) 




08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



INITIAL BLOCK SCANNING OBJECT (Step Q+5) 
270(14) 270(15) 270(16) 270(17) 270(18) 



01 
12A 
23F 
34KX 
45P 
56U 
67 
78 
89 
90 



02 

13B 

24G 

35 L 

46Q 

57V 

68 " 

79 

80 

91 



Fig.9Q 




06 


07 


08 


09 


00 


17 


18 


19 


10 


11 


28 


29 


20 


21 


22 


39 


30 


31 


32 


33 


40 


41 


42 


43 


44 


51 


52 


53 


54 


55 


62 


63 


64 


65 


66 


73 


74 


75 


76 


77 


84 


85 


86 


87 


88 


95 


96 


97 


98 


99 



WO 98/59497 



11/34 



PCT/US98/13009 



Fig 10A ^U^EQUENT HORIZONTAL BLOCK ^276 

G H I J Y 

L M N 0 W 
Q R S T X' 

280 278 

Fig. 1 OB 

SUB 288(Vf NT H0RI20NTAL BLOCK SCANNING OBJECT (Step 1] 
02 03 04 05 06 07 08 09 



OlU'l 
12V 



00 



, 13 14 15 16 17 18 19 10 11 
23WV24 25 26 27 28 29 20 21 22 
34X; 35 36 37 38 39 30 31 32 33 

44 



45YJ 46 47 48 49 40 41 42 43 
5 £ )>7 5? 59 50 51 52 53 54 55 

66 
77 



67 68 69 60 61 62 63 64 65 

78 79 70 71 72 73 74 75 76 

89 80 81 82 83 84 85 86 87 88 

90 91 92 93 94 95 96 97 98 99 



Fig. IOC 



SU 270^U EN 288(2) ?,ZONTAL BLOCK SCANN,NG OBJE CT (Step 2) 

01E]l02U'l\03 04 05 06 07 

12 J 13 V 14 15 16 17 18 

23 0 p24W^25 26 27 28 29 

34 T 35 X' 36 37 38 39 30 

45 Yj 46 Yj 47 48 49 40 41 

56 57 58 59 50 51 5? 

67 68 69 60 61 62 63 

78 79 70 71 72 73 74 

89 80 81 .82 83 84 85 

90 91 92 93 94 95 96 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



Fig.lOD 



01 D 
12 I 
23 N 
34 S 
45 X 
56 
67 
78 
89 
90 



35 T 
46 Y 
57 
68 
79 
80 
91 



36 X' 
47 Y 
58 



SUBSEQUENT HORIZONTAL BLOCK SCANNING OBJECT (SteD 31 
270;(2) 270'f3) 288(3) H ' 

05 
16 
27 
38 
49 
50 
61 
72 



1 1 » ' - 
02 El 03 U' 

13 J 14 V 

5-^24 0^ 25 W 



69 
70 
81 
92 



04 
15 
26 
37 
48 
59 
60 
71 
82 
93 



83 
94 



06 
17 
28 
39 
40 
51 
62 
73 
84 
95 



07 
18 
29 
30 
41 
52 
63 
74 
85 
96 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



WO 98/59497 



12/34 



PCT/US98/13009 



SUBSEQUENT HORIZONTAL BLOCK SCANNING OBJECT (Step 6) 
27075) 27076) 27077) 270'(8) 288(4) 



01 
12 
23 
34 
45 
56 
67 
78 
89 
90 



|06U r 
/17 V 




Fig.lOE 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



SUBSEQUENT 
270'( 

01 
12 
23 
34 
45 
56 
67 
78 
89 
90 




HORIZONTAL BLOCK SCANNING OBJECT (Step Q+6) 
'5j 270'(I6) 270717) 2707J8) 288(5) 



03 


04 


05 


06 


07 


08 


09 


00 


14 C 




15 D 




16E 




17 U' 




18 


19 


10 


11 


25 H 


J 


261 




27 J 




28 V 




29 


20 


21 


22 


36M 


37 N 




380 




39W 




30 


31 


32 


33 


47 R 


48 S 


49T 


40X' 


41 


42 


43 


44 


58W 


59 X 


50Y 


51V 


52 


53 


54 


55 


69 


60 


61 


62 


63 


64 


65 


66 


70 


71 


72 


73 


74 


75 


76 


77 


81. 


82 


83 


84 


85 


86 


87 


88 


92 


93 


94 


95 


96 


97 


98 


99 



Fig. 1 OF 



WO 98/59497 



13/34 



PCT/US98/13009 



"Fi rr "I 1 A SUBSEQUENT VERTICAL BLOCK ^-294 
ny. I l/\ F G H I J ^ 



K L M N O 

"D f\ Ti O T* 

X V^/ IV O JL 

U V W X Y 

A' B' C D' E' 
v 

296 



Fig. 1 IB 



INITIAL BLOCK SCANNING OBJECT (Step Q+1 ] 
302(1) 



01 


02 


03 


04 


05 


06 


07 


08 


09 


00 


12 J 




13 


14 


15 


16 


17 


■18 


19 


10 


11 


23 0 




24 


25 


26 


27 


28 


29 


20 


21 


22 


34 T 




35 


36 


37 


38 


39 


30 


31 


32 


33 


45 Y 


46 


47 


48 


49 


40 


41 


42 


43 


44 


56 E' 


57 


58 


59 


50 


51 


52 


53 


54 


55 


67 


68 


69 


60 


61 


62 


63 


64 


65 


66 


78 


79 


70 


71 


72 


73 


74 


75 


76 


77 


89 


80 


81 


82 


83 


84 


85 


86 


87 


88 


90 


91 


92 


93 


94 


95 


96 


97 


98 


99 



Fig. 11C 



INITIAL BLOCK SCANNING OBJECT (Step Q+2J 
302(2] 302(3) 



01 


02 


03 


04 


05 


06 


07 


08 


09 


00 


12 I 




13 J 




14 


15 


16 


17 


18 


19 


10 


11 


23 N 




24 0 




25 


26 


27 


28 


29 


20 


21 


22 


34 S 


^35T 


>> 


36 


37 


38 


39 


30 


31 


32 


33 


45 X 


46 Y 


47 


48 


49 


40 


41 


42 


43 


44 


56D'_ 


57 E' 


58 


59 


50 


51 


52 


53 


54 


55 


67 


68 


69 


60 


61 


62 


63 


64 


65 


66 


78 


79 


70 


71 


72 


73 


74 


75 


76 


77 


89 


80 


81 


82 


83 


84 


85 


86 


'87 


88 


90 


91 


92 


93 


94 


95 


96 


97 


98 


99 



Fig. 1 ID 



INITIAL BLOCK SCANNING OBJECT (Step Q+5) 
302(4) 302(5) 302(6) 302(7) 302(8) 



r\ 



01 
I2F" 
23K 
34P 
45U 
56A* 
67 
78 
89 
90 



02 

13£ 

24L 

35Q 

46V 

57B* 

68 " 

79 

80 

91 



03 
UH 
25M 
36 R 
|47W 
58C, 
69 
70 
81 
92 



04 
15 I 
^26N 
Ml37S 
48X 
59D'j 
60 
71 
82 
93 



05 
16 J 
270 
38TV 
49Y 
50E' 
61 
72 
83 
94 



06 
17 
28 
39 
40 
51 
62 
73 
84 
95 



07 
18 
29 
30 
41 
52 
63 
74 
85 
96 



08 
19 
20 
31 
42 
53 
64 
75 
86 
97 



09 
10 
21 
32 
43 
54 
65 
76 
87 
98 



00 
11 
22 
33 
44 
55 
66 
77 
88 
99 



304'(4) 304'(5) 304'(6) 304'(7) 304'(8) 



WO 98/59497 



14/34 



PCTAJS98/13009 



01 


02 


03 


04 


05 


06 


12 


13 F 




14 G 




15 H 




160 




17 r 


23 


24K 




25 L 




26M 




27N 


J 


28 0 


34 


35 P 




36 Q 


>> 


37 R 


J 


38 S 


39 T 


45 


46U 


47 V 


48W 


49X 


40 Y 


56 


57A' 


58 B' 


59C 


50D' 


51 E' 


67 


68 


69 


60 


61 


62 


78 


79 


70 


71 


72 


73 


89 


80 


81 


82 


83 


84 


90 


91 


92 


93 


. 94 


95 



INITIAL BLOCK SCANNING OBJECT (Step Q+6J 

302(9) 302(10) 302(1 1) 302(12) 302(13) 

07 08 09 00 

18 19 10 11 

29 20 21 22 

^30 31 32 33 

41 42 43 44 

52 53 54 55 

63 64 65 66 

74 75 76 77 

85 86 87 88 

96 97 98 99 



Fig. HE 



INITIAL BLOCK 



302 ( 

02 

IB 

23F 

38K 

49P 

56U 

6IA' 

78 " 

89 

9$ 



) 302( 
02 
13 

24 G 
35 L 

57V 
68B' 
79 
80 
91 



SCANNING OBJECT (Step 2Q+5) 
5) 302(16) 302(17) 302(18) 



03 


04 




06 


07 


08 


09 


00 


14 


15 




17 


18 


19 


10 


11 


25 H 




261 




J 




28 


29 


20 


21 


22 


36M 




37N 




0 




39 


30 


31 


32 


33 


47 R 




48 S 


J 


T 




40 


41 


42 


43 


44 


58W 


59X 




51 


52 


53 


54 


55 


69 C* 


60D: 


l\ 


62 


63 


64 


65 


66 


70 


71 




73 


74 


75 


76 


77 


81 


82 




84 


85 


86 


87 


88 


92 


93 




95 


96 


97 


98 


99 



Fig. 1 IF 



WO 98/S9497 



15/34 



PCT/US98/13009 



rig. y jl 



352 



DETERMINE DENSE MOTION 
ESTIMATION 



350 



DEFINE TRANSFORMATION 
BLOCK ARRAY 



-354 



358 



GENERATE AFFINE TRANSFORMATIONS 



QUANTIZE AFFINE TRANSFORMATION 
COEFFICIENTS 



-362 



Fig. 14 

364b 



356 













\ 




















/ 






















































































































































1 


I 






















































































































































-J 
































































































\ 












































































































































































































































X 




1 1 













•364c 



364a 



WO 98/59497 J6/34 PCT/US98/13009 



Fig. 15 



370, 



DEFINE INITIAL 
TRANSFORMATION BLOCK 



372 



CALCULATE CURRENT 
SIGNAL-TO-NOISE RATIO 



r 



376 



378 



SUBDIVIDE CURRENT TRANSFORMATION 
BLOCK AND CALCULATE 
FUTURE SIGNAL-TO-NOISE RATIOS 



382 



YES 



IS 

SIGNAL-TO-NOISE 
RATIO DIFFERENCE 
GREATER THAN 
THRESHOLD 
7 



NO 



384 



DESIGNATE EACH 
SUB-BLOCK THE CURRENT 
TRANSFORMATION BLOCK 



388 



DESIGNATE NEXT 
TRANSFORMATION 
BLOCK 



WO 98/59497 



17/34 



PCT/US98/13009 




50 



Fig. 16 



386a 



386b 



380b 



WO 98/59497 



PCT/US98/13009 



18/34 



Fig. 17A 



400 



DEFINE EXTRAPOLATION 
BLOCK BOUNDARY 



404 



ASSIGN VALUES TO PIXELS 
IN EXTRAPOLATION BLOCK 
BUT NOT OBJECT 



410 



414 



SCAN HORIZONTAL LINES FOR 
PIXEL SEGMENTS WITH ASSIGNED 
AND UNASSIGNED VALUES 



NO 



422 



IS 

HORIZONTAL 
SEGMENT BOUNDED^ 
AT BOTH ENDS 
BY OBJECT 
PERIMETER. 
? 



YES 



ASSIGN VALUES 
OF PERIMETER 
PIXELS 



426 



ASSIGN AVERAGES 
OF PERIMETER 
PIXELS 



SCAN VERTICAL LINES FOR 
PIXEL SEGMENTS WITH ASSIGNED 
AND UNASSIGNED VALUES 



430 



(To Fig. 17BJ 



WO 98/59497 



19/34 



(From Fig. 17A) 



432 



NO 



438 



IS 

VERTICAL 
SEGMENT BOUNDED 
r BOTH ENC 
BY OBJECT 
PERIMETER. 
? 



YES 



ASSIGN VALUES 
OF PERIMETER 
PIXELS 



s / 




ASSIGN AVERAGES 
OF PERIMETER 
PIXELS 



ASSIGN COMPOSITE PIXEL VALUES 
TO OVERLAPPING HORIZONTAL 
AND VERTICAL PIXEL SEGMENTS 



448 



ASSIGN COMPOSITE ASSIGNED 
VALUES TO REMAINING 
NON-OBJECT PIXELS 



450 



Fig. 17B 



WO 98/59497 



PCT/US98/13009 



20/34 



Fig. 18A 




Fig. 18C 



436 



Fig. 18B 




424 



420 




Fig. 18D 

454 



454 



452 




WO 98/59497 



21/34 



PCT7US98/13009 




WO 98/59497 



22/34 



PCT/US98/13009 



Fig. 21 



560 



OBTAIN DENSE MOTION 
VECTOR FIELD 



562 



EXTRAPOLATE DENSE MOTION 
VECTOR FIELD TO REGULAR 
CONFIGURATION 



564 



LOSSY Efv 


< 

JCODING 






LOSSLESS ENCODING 



566 
568 

570 



ENCODED DENSE MOTION 
VECTOR FIELD 



Fig. 22 



98 



QUANTIZED PRIOR OBJECT 




602 



LOSSY ENCODING 






604 




STORAGE 






606 




RETRIEVAL 



DECODING 



c 



608 



98 



QUANTIZED PRIOR OBJECT 



600 



WO 98/59497 



23/34 



PCT/US98/13009 




WO 98/59497 



24/34 



PCT/US98/13009 




Fig. 23B 

136 




706 



704 



708 



> ► 


ENTROPY 
DECODER 




INVERSE 
WAVELET 
CODER 






► 







Fig. 20A 



506a 



506b 506c 



504 



* 4 


6 


8 


:io 


:i5 


12: 


4 i 


12 


15 




: 5 


8 


9; 


:io 


:i4 


is: 


io; 


12 


14 


8 


: 6 


10 


10- 


;I9 


i 13 


n i 


9 i 


9 


9 


10 


: 7 


12 


Hj 


: 8 


n 


8 


6 


5 


8 




8 


13 


9 


i ii 


7 


9 


5 


2 


4 





Fig. 20C 



515 



Fig. 20B 

8 9 10 12 12 

9 10 10 11 11 

10 10 10 9 9 



512 



Fig. 20D 

8 9 10 11 12 

9 9 10 10 10.5 

10 10 10 9.5 9 



520 



WO 98/59497 



PCT/US98/13009 



25/34 



Fig. 25A 

810 



OBTAIN CONTOUR 



816 



1 



IDENTIFY 
INITIAL 
PIXEL 



818 



r 



820 



ASSIGN initial CHAIN CODE 



YES 



838 




836 



NO 



ASSIGN CHAIN CODE 
TO NEXT PIXEL 



840 



SUBSTITUTE SPECIAL 
CASE MODIFICATION 



860 



YES- 




NO 



REMOVE INCURRED 
NONCONFORMAL 
DIRECTION CHANGES 



862 



GENERATE HUFFMAN CODE 
FROM CHAIN CODES 



864 



WO 98/59497 



PCT/US98/13009 



26/34 



802c 



Fig. 24A 3 

PRIOR ART 




802f 



800 



802g 



Fig. 24B 

PRIOR ART 



















X 


A 


B 










G 






C 


•< — 


806 






F 


E 


D 




- — 804 

















Fig. 25B 




826c 



WO 98/59497 



28/34 



PCT/US98/13009 



Fig. 26 

( START^ ) 



900 



RECEIVE FROM AN IMAGE SOURCE 
A CURRENT BITMAP SERIES 



^902 



IDENTIFY THE FIGURES OF THE 
FIRST IMAGE ON CURRENT BITMAP 



r 904 



IDENTIFY THE PARTS OF THE FIGURES 



r 906 



SELECT DISTORTION POINTS FOR EACH 
FEATURE ON THE CURRENT BITMAP 



r 908 



SUPERIMPOSE A GRID OF TRIANGLES UPON 
THE PARTS OF THE CURRENT BITMAP 



r 9\0 



DETERMINE A CURRENT LOCATION 
OF EACH TRIANGLE 



^912 



STORE THE CURRENT LOCATION OF EACH 
TRIANGLE TO THE STORAGE DEVICE 



r 914 



RETAIN A PORTION OF DATA DERIVED FROM 

THE CURRENT BITMAP THAT DEFINES THE 
FIRST IMAGE WITHIN THE CURRENT LOCATION 
OF EACH TRIANGLE ON THE CURRENT BITMAP 



r 9)6 



RECEIVE FROM THE CURRENT BITMAP 
SERIES A SUCCEEDING BITMAP 



^918 



SUPERIMPOSE THE CURRENT GRID OF 
TRIANGLES ONTO THE SUCCEEDING BITMAP 



920 



REAUGN THE DISTORTION POINTS TO 
COINCIDE WITH CORRESPONDING 
FEATURES ON THE SUCCEEDING BITMAP 



^-922 



DETERMINE A SUCCEEDING LOCATION OF 
EACH TRIANGLE ON THE SUCCEEDING BITMAP 



r-924 



STORE THE SUCCEEDING LOCATION OF 
EACH TRIANGLE TO THE STORAGE DEVICE 



^-926 



RETAIN A PORTION OF DATA DERIVED 
FROM THE SUCCEEDING BITMAP THAT 
DEFINES THE SECOND IMAGE WITHIN THE 

SUCCEEDING LOCATION OF EACH 
TRIANGLE ON THE SUCCEEDING BrTMAP 

1- 




RETAIN A PORTION OF DATA DERIVED FROM 

THE CURRENT BITMAP THAT DEFINES THE 
FIRST IMAGE WITHIN THE CURRENT LOCATION 
OF EACH TRIANGLE ON THE CURRENT BITMAP 



_232_ 



DETERMINE AN AVERAGE IMAGE OF EACH 
TRIANGLE IN THE CURRENT BITMAP SERIES 
FROM THE SEPARATELY RETAINED DATA 



•934 



STORE THE AVERAGE IMAGE OF EACH 
TRIANGLE TO THE STORAGE DEVICE 



936 



RETRIEVE THE CURRENT LOCATION OF 
EACH TRIANGLE FROM A CURRENT BITMAP 



-938 



CALCULATE A TRANSFORMATION 
SOLUTION FOR TRANSFORMING THE 
AVERAGE IMAGE OF EACH TRIANGLE 

TO THE LOCATION OF EACH 
TRIANGLE ON THE CURRENT BITMAP 



940 



GENERATE A PREDICTED BITMAP 



942 



COMPARE THE PREDICTED BITMAP 
WITH THE CURRENT BITMAP 



944 



DETERMINE A CORRECTION BITMAP 



948 



STORE THE CORRECTION BITMAP 




NO 



THE SUCCEEDING BrTMAP 
BECOMES THE CURRENT BITMAP 




END 



RECEIVE THE SUCCEEDING BITMAP 
SERIES AS THE CURRENT BITMAP SERIES 



WO 98/59497 



30/34 



PCT/US98/13009 



Fig. 28 - 

START ^ 



1000 



RETRIEVE THE C 
SERIES FROM THE 


iURRENT BITMAP 
STORAGE DEVICE 




, ^ — 1002 


RETRIEVE THE AVERAGE IMAGE OF 
' EACH TRIANGLE OF THE CURRENT 
BITMAP SERIES FROM THE 
STORAGE DEVICE 




. ^—1004 


PASS THE AVERAGE IMAGE OF EACH 
TRIANGLE TO DISPLAY PROCESSOR 


r — 1 006 


« _ 


r 


RETRIEVE THE LO 
TRIANGLE ON THE 
FROM THE STC 


CATION OF EACH 
CURRENT BITMAP 
)RAGE DEVICE 



1008 



PASS THE CURRENT LOCATION OF EACH 
TRIANGLE TO DISPLAY PROCESSOR 



1010 



CALCULATE A TRANSFORMATION SOLUTION 
FOR TRANSFORMING THE AVERAGE IMAGE 
OF EACH TRIANGLE TO THE CURRENT LOCATION 
OF EACH TRIANGLE ON THE CURRENT IMAGE 



1012 



GENERATE A PREDICTED BITMAP 
IN THE DISPLAY PROCESSOR 



'014 



RETRIEVE THE CORRECTION BITMAP 
FOR THE CURRENT BITMAP 
FROM THE STORAGE DEVICE 



1016 



PASS THE CORRECTION BITMAP 
TO THE DISPLAY PROCESSOR 



1018 



GENERATE A DISPLAY BITMAP OF THE FIRST 
IMAGE IN THE DISPLAY PROCESSOR BY 
OVERLAYING THE PREDICTED BITMAP 
WITH THE CORRECTION BITMAP 




NO 



THE SUCCEEDING BITMAP BECOMES 
THE CURRENT BITMAP 




1028 



THE SUCCEEDING BITMAP SERIES 
BECOMES THE CURRENT BITMAP 



WO 98/59497 



31/34 



PCT/US98/13009 



Fig. 29 



1100 



1102 

A 



OBTAIN OBJECT 
INFORMATION 



1104 



DISTINGUISH SPRITE- 
DEFINED OBJECTS 



1^06 



ENCODE SPRITES AND 
TRAJECTORY INFORMATION 



11.08 



STORE / RETRIEVE OR 
TRANSMIT /RECEIVE 
ENCODED INFORMATION 



tyio 



DECODE SPRITE- 
DEFINED OBJECTS 



WO 98/59497 



PCT/US98/13009 .. ,*■ 



33/34 



Fig. 31 



\ /I r> I — /-\ r»n ir-«i- 
V IUCU UDJtb I 



MOTION 
ESTIMATION 



7 

1200 



1202 



1204 



WARPING 



FRAME 
MEMORY 



BLENDING 

ZZf — 



1206 



SPRITE 



Fig. 32 



1312 

/ 




NEXT FRAME 


1310 

/ 





BLEND S 



i-1 



WITH Wj , 

s i =i s i-1+ w i 
i+1 




COMPUTE M 
BETWEEN I: 



AND S 



i-1 



1306 



WARP lj TO 
Sj_-j USING M j 



1308 



WO 98/59497 PCT/US98/13009 

34/34 



Fig. 33 




START 



For each pixel 
inside mask of I 



1336 
next pixel 



1332- 




1330 



1344- 



FIND POSITION OF 
CORRESPONDING 
PIXEL IN S 



SOLVE SYSTEM 
OF EQUATIONS 
AAM=b 



1334- 

NO /POSITION" 
INSIDE MASK. 
OFS 
? 



YES 



1346- 



1348- 



UPDATE MOTION 
PARAMETERS 
Mf+i = M^+AM 



1338- 



COMPUTE e 



1340- 



COMPUTE PARTIAL 
DERIVATIVE OF e : 
WITH RESPECT 
TOM 0 . 8 



1342 




ADD PIXEL'S 
CONTRIBUTION 
TO A AND b 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US98/13009 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC(6) :H04N 7/32 

US CL :348/699 
According to International Patent Class' 



B. FIELDS SEARCHED 



Minimum documcnution searched (classification system followed by classification symbols) 
U.S. : 348/699, 384, 390, 400, 401, 402, 409, 413, 415, 416; 382/190, 203 , 232, 236 , 238, 241, 243 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



A 

A, P 



US 5,136,659 A (KANEKO et al.) 04 AUGUST 1992, Fig. 2. 

US 5,654,771 A (TEKALP et al.) 05 AUGUST 1997, col. 3, 
lines 21-61. 



1-21 



1-21 



□ 

Further documents are listed in the continuation of Box C. See patent family annex. 



Special categories of cited document*: 

d ocumcai defining the general state of the an which n not considered 
to be of particular relevance 

earlier document published oa or after the mternanonal filing date 

document which may throw doubt* on priority ciaim(s) or which is 
cued to rrt a hlbh the publication date of another citation or other 
i <aa Kpecifiod) 



rioramml referring to an oral disclosure, use. exhibition or other 



d oc u me n t published prior to the international filing date but later t 
the priority date claimed 



later document published after the international filing date or priority 
date and not in conflict with the application but cited to understand the 
principle or theory underlying the invention 

document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to involve an inventive step 
when the document is taken alone 

document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to a person skilled in the art 

document member of the same patent family 



Date of the actual completion of the international search 
24 SEPTEMBER 1998 



Date of mailing of the international search report 

220CT1998 



Name and mailing address of the ISA/US 
Committioner of Patent* and Trademarks 
Box PCT 

Washington, D.C. 20231 
Facsimile No. P03\ 305-3230 



Authorized officer 

HOWARD W. BRJTTON 
Telephone No. (703) 305-472* 




Form PCT/1SA/210 (second sheet)(July 1992)* 



